# Courses

## Core Courses

### The common core curriculum consists of the following 4 didactic and 2 elective courses. Most students entering the BIDS-TP will be expected to have completed multivariable Calculus and Linear Algebra. Those who have not are required to take our mathematical foundations course (Bioinf 501). All students are also required to take EECS 409: Data Science Seminar.

### All students will be required to take 2 semesters of biomedical informatics (Bioinf 529 and Bioinf 580) and 2 semesters of data science courses (HS 650 and EECS 553). This core curriculum may be tailored to your individual goals and expertise by making one course substitution (see below).

### Some students may need to take additional probability and statistics courses (e.g., Stats 425/426 or Biostat 601/602).

### The core courses, and some of the available elective courses, are described below.

## Students are required to take 3 of the following 4 classes:

This Python-based course will introduce students to common topics in bioinformatics (Gibbs sampling, HMM, sequence alignment, phylogenetic trees, proteomic quantification, network and enrichment analysis) as well as corresponding computational approaches in those areas. Students will learn how to implement and apply various algorithms and statistical models to solve challenging problems and build a foundation for developing tools for future technologies.

The course covers signal processing and machine learning methods with an emphasis on their applications in healthcare. Students develop a basic understanding in the mathematical foundations of biomedical data analysis. Topics include: 1) transforms and feature extraction – Fourier transform, wavelet transformation, fundamentals of information in theory; 2) Introduction to machine learning – clustering vs classification, Naïve Bayes, Classification and regression trees. Random forest, support vector machines, introduction to neural networks, and sparse learning; 3) applications in medicine and biology.

This course provides a general overview of the principles, concepts, techniques, tools and services for managing, harmonizing, aggregating, preprocessing, modeling, analyzing and interpreting large, multi-source, incomplete, incongruent, and heterogeneous data (Big Data). It exposes students to common challenges related to handling Big Data and presents the enormous opportunities and power associated with our ability to interrogate such complex datasets, extract useful information, derive knowledge, and provide actionable forecasting [5]. Biomedical, healthcare, and social datasets will provide context for addressing specific driving challenges. Students learn about modern data analytic techniques and develop skills for importing and exporting, cleaning and fusing, modeling and visualizing, analyzing and synthesizing complex datasets. The collaborative design, implementation, sharing and community validation of high-throughput analytic workflows is emphasized throughout the course.

The goal of this course is to provide mathematical foundations for subsequent signal processing and machine learning courses, while also introducing matrix-based signal processing and machine learning methods/applications that are useful in their own right.

### One of the following classes may be used to substitute 1 of the above 4:

BIOINF 593 (Machine Learning in Computational Biology) – may substitute for HS 650 or EECS 553

EECS 545 (Machine Learning (CSE)) or EECS 553 (Matrix Methods for Signal Processing, Data Analysis and Machine Learning)– may substitute for EECS 551

## Elective Courses

There are over three dozen graduate Data Science and Bioinformatics courses that are included as electives in the BIDS-TP Program. For brevity, below we include just a few of these elective courses to show the Program curriculum breadth and depth. The didactic portions of each individual curriculum plan will be tailored around the above core and these elective courses. Some electives also fulfill primary department curriculum requirements.

### Bioinf 501 (Mathematical Foundations of Bioinformatics)

Required for students without linear algebra/ multivariate calculus

This course covers some of the fundamental mathematical techniques commonly used in bioinformatics and biomedical research. These include: (1) principles of multivariable calculus, and complex numbers/functions, (2) foundations of linear algebra, such as linear spaces, eigenvalues and vectors, singular value decomposition, spectral graph theory, Markov chains, (3) differential equations and their usage in biomedical systems, including topic such as existence and uniqueness of solutions, bifurcations in one and two dimensional systems and cellular dynamics, and (4) optimization methods, such as free and constrained optimization, Lagrange multipliers, data denoising using optimization, and heuristic methods.

### Bioinf 523 (Introductory Biology for Computational Scientists)

Introduces basic biology to STEM graduate students without any prior college biology training. Geared towards students in Bioinformatics, Biostatistics, or other computational fields who have quantitative training (computer science, engineering, mathematics, statistics, etc.) Covers major topics related to biomedical research including: organic and biochemistry, molecular biology, genetics, cell biology, and microbiology.

### Bioinf 593/EECS 598 (Machine Learning in Computational Biology)

Computational biology is a rich and growing field featuring large, complex, and noisy datasets. This exciting area both draws upon the techniques of machine learning for scientific discovery and offers challenging problems that push the boundaries of machine learning.

We will introduce the foundational machine learning techniques used in computational biology and describe their applications to biological data. Key topics include linear and nonlinear dimension reduction; deep learning for non-Euclidean data types, such as trees, graphs, and manifolds; deep generative models; and other unsupervised learning approaches. The course covers theoretical foundations and practical implementation of the techniques, in addition to the biological background needed for computational biology applications.

Expertise in programming, calculus, linear algebra, and probability are required.

### Bioinf 602/603 (Bioinformatics Journal Club)

Bioinformatics Journal Club entails a weekly discussion of current and classic papers concerning biology on a whole-genome scale, or using genome sequence based approaches. It is a great opportunity for students and researchers to be exposed to current topics of Bioinformatics. Although the presentations are on a volunteer basis, participants are encouraged to present. Each week's paper is chosen by the presenter a week in advance.

Journal Club is open to anyone interested in participating. This course is for first-year students who have not taken a journal club before. Presentation is required only for BIOINF 603.

### Biostat 601 (Probability and Distribution Theory)

This course covers combinatorial analysis, sample spaces, events, and set operations, axioms of probability, properties of probability functions, conditional probability and independence, random variables, distributions, densities, expectation, and variance. It also includes applications of moment generating functions, convergence theorems, joint distribution modeling, sampling distributions, and different modes of convergence (almost sure, in probability, in distribution).

### Biostat 602 (Biostatistical Inference)

This course provides students with deep understanding of key concepts of data analytics and statistical inference. Various methods to properly process data, organize information, and quantify uncertainty are presented as solutions to substantive health questions. This course covers both statistical estimation and inference, including point estimation, confidence interval estimation, hypothesis testing and basic asymptotic theory. The primary focus will be on the frequentists school of statistical estimation and inference, with some basics of Bayesian inference.

### EECS 545 (Advanced Machine Learning)

This course is focused on statistical machine learning and reinforcement learning methods that have had a major impact on the machine learning field over the past decade. It also attends to the problem of connecting these representations to the symbolic knowledge representation methods that have been at the core of Artificial Intelligence (AI).

### EECS 551 (Matrix Methods for Signal Processing, Data Analysis and Machine Learning)

Theory and application of matrix methods to signal processing, data analysis and machine learning. Theoretical topics include subspaces, eigenvalue and singular value decomposition, projection theorem, constrained, regularized and unconstrained least squares techniques and iterative algorithms. Applications such as image deblurring, ranking of webpages, image segmentation and compression, social networks, circuit analysis, recommender systems and handwritten digit recognition.

### LHS 610 (Exploratory Data Analysis for Health)

Students in this course will learn foundational topics in data science focused on healthcare data. The course is based on two large themes: (a) understanding and becoming familiar with healthcare data, and (b) making inferences based on data. Students will develop a working understanding of R, one of the most widely used languages for data science, and an introductory understanding of several other tools used in analyzing healthcare data. Students will participate in a longitudinal group project spanning the principles learned during the course using real-life healthcare data sets.

### Math 571 (Numerical Linear Algebra)

This course covers vector and matrix norms, orthogonal matrices, projectors, singular value decomposition (SVD). It formulates least squares problems, QR factorization, normal equations, Gram-Schmidt orthogonalization, Householder triangularization, and conjugate gradient methods. The applications include image compression, finite-difference scheme for a two-point boundary value problem, Dirichlet problem for the Laplace equation, and least squares fitting.

### Stats 425/Math 425 (Introduction to Probability)

This course introduces students to both useful and interesting ideas from the mathematical theory of probability and to a number of applications of probability to a variety of fields including genetics, economics, geology, business, and engineering. The theory developed together with other mathematical tools such as combinatorics and calculus are applied to everyday problems. Concepts, calculations, and derivations are emphasized.

### Stats 426 (Introduction to Theoretical Statistics)

This course introduces probability theory, random walks, discrete counting and Poisson processes, Markov chains, and Monte Carlo simulations. It also covers discrete and continuous time, and equations for stationary distribution introduction to Brownian motion. Selected applications such as branching processes, financial modeling, genetic models, the inspection paradox, inventory and queuing problems, prediction, and/or risk analysis.

### Stats 503 (Applied Multivariate Analysis)

Topics covered include principal components analysis and other dimension reduction techniques, classification (discriminant analysis, decision trees, nearest neighbor classifiers, logistic regression, support vector machines, ensemble methods), clustering (agglomerative and partitioning methods, model-based methods), categorical data analysis. The focus is on modern multivariate data analysis methods, how to use them, and when they should and should not be applied.