AMAT 585: Practical Methods in Topological Data Analysis
Fall 2020, Class #7771
TTh 3:00-4:20, Zoom
Instructor: Michael Lesnick
mlesnick [at] albany [dot] [the usual thing]
Office Hours: By appointment.
About this Course:
This is the final course in a three-semester sequence on Topological Data
Analysis (TDA), aimed primarily at students in Albany's Data Science
MS program. This is a project-based course whose goal is to give students
hands-on experience with TDA, and with data anlaysis more broadly.
The course will be conducted entire online, synchronously via Zoom.
Tentative Course Plan:
- In the first few course meetings, I will give a brief series of lectures
on a couple of topological and geomteric data analysis tools not covered in my sections of
TDA I/II last year. Specifically, I will cover Mapper, Multiparameter persistent homology
(e.g., RIVET, Hera).
- Next, each students will give a presentation on
a method for non-linear dimensionality reduction. The methods we will
cover will be PCA, MDS, Isomap, tSNE, and UMAP.
- In the remainder of
the course, sutdents will apply TDA, clustering, and
dimensionality reduction tools to sythnetic and real data. First, we
will have a set of projects where I give students synthetic data sets with interesting
geometric structure. The students will use multiple tools to probe this
structure, and report on their findings in class and in a breif
written report.
- In the final part of the
semester, students will do a project exploring real data using these tools. The project will be chosen by the
student, with my input and consent. In the later parts of the course,
there will be lttle or no lecturing. Instead, students will provide
regular updates on their progress with the projects, and we will discuss any
questons or issues that arise. The course will conclude with student
presentations on their final projects. Students will be permitted to work on the
final projects in groups of up to three people if they choose, but I expect that the ground covered by a project will be
proportional to the number of people involved, and that
responsibilities will be clearly delineated, with my input.
Expectations:
Data analysis can be time-consuming: In addition to the interesting
stuff, one has to spend considerable doing boring things like
installing software on one's compputer, cleaning data, and
troubleshooting. With that in mind, students should
exect to devote substantial time every week on this course--roughy
8-10 hours, I'd estimate. Also, while I am generally available to help with mathematics
and data science questions, be warned that my availability to help
with techinical computing issues (e.g., trouble installing software,
bugs in your code, etc.) is very limited, and you should expect to handle
such issues with little or no help from me.
Prequisites:
Students are formally required to have either taken TDA I and II (AMAT
583/584) or to have permission of the instructor. In addition, you
are expected to have a basic competence in programming and using computers for data
analysis (including installing new software and teaching yourself how
to use it). Some knowledge of Python will be helpful.
Course Materials:
There will be no course textbook or other formal set of course
materials, but for the lecture portion of the course, I will make my (handwritten) lecture notes available.
Software:
Most or all of the TDA software we will us in this course can be found
on github, under the tag "topological-data-analysis". There is quite a lot
there, and I will make more specific suggestions about what software
to use as the course progresses.
You may also find
scikit-learn to be useful for clustering and
dimensionality reduction.
Recommended reading:
Grading:
The class will use the university's A-E grading scheme.
60%: Projects,
30%: Presentations,
10%: Attendance/Participation/Engagement
Academic Regulations:
Naturally, the University's Standards of Academic Integrity apply to
this course, and students are expected to be familiar with these.