AMAT 585: Practical Methods in Topological Data Analysis
Fall 2023, Class #7604
Monday, Wednesday 11:40-1:00, Humanities 115
Instructor: Michael Lesnick
mlesnick [at] albany [dot] [the usual thing]
Office Hours: Monday, Wednesday 4:30-5:30, and by appointment.
About this Course:
This is the third course in a three-semester sequence on Topological Data
Analysis (TDA), aimed primarily at students in Albany's Data Science
MS program. It is a project-based course whose goal is to give students
hands-on experience with TDA, and with data analysis more broadly.
The course centers on a single semester-long project.
This is a fully in-person course, and students are required to attend. I do understand that situations may occasionally arise that make attendance impossible, and I intend to be flexible.
Tentative Course Plan:
- In the first few course meetings, I will give a brief series of lectures reviewing geometric and topological data analysis tools and discussing a few tools not covered in my sections of
TDA I/II last year. One main focus will be multiparameter persistent homology.
- Next, each student will give two presentations, one on a method for dimensionality reduction, and one on a clustering method. The methods covered may include PCA, MDS, Isomap, tSNE, UMAP, k-means clustering, agglomerative clustering (e.g., single linkage, average linkage), Gaussian mixtures, spectral clustering, density-based clustering (e.g., DB-Scan, ToMATo).
- Concurrently, starting from the very beginning of the course, students will begin researching and selecting the main topic for their final project. Once the lectures and presentations mentioned above are complete, in the remainder of the course, each student will do a (brief) weekly presentation, providing updates on their progress with their project. We will discuss any questions or issues that arise.
The course will conclude with student presentations on their final projects. Written reports on the final projects will also be submitted.
Project details :
The project will be chosen by the student, with my input and consent. Most projects will involve the application of topological and geometric tools to real world data. Projects focused either on supervised learning or on exploratory data analysis are acceptable. A project focused on algorithms or on theoretical questions is also an option.
Students will be permitted to work on the final projects in groups if they choose, but I expect that the ground covered by a project will be
proportional to the number of people involved, and that
responsibilities will be clearly and evenly delineated.
Expectations:
This course will require substantial time, independence, and academic maturity. Students should be be aware that data analysis can be time-consuming: In addition to the interesting
stuff, one has to spend considerable time doing boring things like
installing software on one's computer, cleaning data, debugging,
troubleshooting, and waiting for long computations to finish. In addition, I expect that most students will work with data from real world applications for their projects; substantial time will be required to study the relevant literature and learning the application.
With that in mind, students should expect to devote substantial time to this course. Also, while I am generally available to help with mathematics
and data science questions, be warned that my availability to help
with technical computing issues (e.g., trouble installing software,
bugs in your code, etc.) is very limited, and you should expect to handle
such issues with little or no help from me. Moreover, depending on the application area you choose to study, I may not be of much help with questions about application areas.
Prerequisites:
Students are formally required to have either taken TDA I and II (AMAT
583/584) or to have permission of the instructor. In addition, you
are expected to have a basic competence in programming and using computers for data
analysis at the level of AMAT 502.
Course Materials:
There will be no course textbook or other formal set of course
materials, but for the lecture portion of the course, I will make my (handwritten) lecture notes available.
Software:
Much of the TDA software we will use in this course can be found
on github, under the tag "topological-data-analysis". There is quite a lot
there, and I will make more specific suggestions about what software
to use as the course progresses.
You might also find
scikit-learn to be useful for clustering and
dimensionality reduction.
Recommended reading (a very incomplete list):
Grading:
The class will use the university's A-E grading scheme.
50%: Final Project (including both oral and written component),
30%: Other Presentations,
20%: Attendance/Participation/Engagement
Academic Regulations:
Naturally, the University's Standards of Academic Integrity apply to
this course, and students are expected to be familiar with these. Pay particular attention the the sections on plagiarism.