• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

Бакалаврская программа «Социология и социальная информатика»

Probability Theory and Mathematical Statistics

2019/2020
Учебный год
ENG
Обучение ведется на английском языке
6
Кредиты
Статус:
Курс обязательный
Когда читается:
1-й курс, 3, 4 модуль

Преподаватель

Course Syllabus

Abstract

This course is designed as an introduction to basic concepts of Probability Theory and Statistics with the emphasis on practical problems. It’s divided into two main parts Probability Theory (Module 3, year 1) and Statistics (Module 4, year 1 and Module 1, year 2). Topics include -combinatorics, -conditional probability, -random variables, -limit laws, -statistical point estimation, -hypothesis testing. The main topics are also illustrated and studied in computer statistical programs such as R, Excel, Mathematica
Learning Objectives

Learning Objectives

  • Studying the theoretical foundations of probability theory and mathematical statistics, their practical development through the construction of mathematical models and solving statistical problems
  • Understanding the types of practical problems, including those arising in sociology, which can be solved using statistical methods, and the ability to use the knowledge gained to solve them
  • Ability to work with programs for mathematical calculations
  • Deepening and expanding the range of knowledge about applied mathematical methods
  • Mastering modern methods of data analysis, for example, basic skills of data research using statistical packages such as R (S-plus)
Expected Learning Outcomes

Expected Learning Outcomes

  • Is able to describe a random experiment and the set of all outcomes.
  • Is able to apply Classical Formula of probability.
  • Knows how to apply Bernoulli formula.
  • Knows the law of total probability.
  • Is able to calculate posterior probabilities by Bayes’ formula.
  • Is able to construct random variables describing a given random experiment.
  • Is able to calculate the expected value and variance of these random variables.
  • Knows the main families of discrete and continuous random variables.
  • Calculates any probability for Normal Distribution.
  • Is able to approximate probabilities of large number of similar events by CLT.
  • Knows what is a population and what is a sample from population.
  • Calculates sample mean, sample variance, unbiased sample variance, sample proportion and quantiles.
  • Is able to calculate sample mean, sample variance, unbiased sample variance, sample proportion and quantiles in R.
  • Knows how to construct confidence intervals for the means and proportions.
  • Is able to show connection with the CLT.
  • Knows when and why one should use Student instead of Normal distribution for CI.
  • Knows the main approaches of hypotheses testing.
  • Is able to construct the null and the alternative hypothesis.
  • Is able to make a statistical inference by the significance level or by p-value.
Course Contents

Course Contents

  • Introduction to Probability. Independence. Bernoulli trials.
    Introduction to Probability • History of Probability • Probability and Data Analysis • Random experiment. Outcomes. Events. • Operations with Events • Statistical definition of probability Properties of Probability • Axiomatic definition of probability • Classical formula of probability for equally likely outcomes. • Inclusion-exclusion formula • Hypergeometric distribution formula • Birthday paradox • Independent and dependent events Bernoulli trials • Independent experiments • Formula for the number of successes. Proof. • Banach's matchbox problem • Probability of the first success in the k-th trial
  • Conditional probability. Bayes formula.
    Conditional probability • Definition. Illustration via contingency tables • Multiplication formula for probabilities • Independence via conditional probability • Generalization of Multiplication formula for k events. Chain rule. Law of Total Probability • Law of Total Probability for two events. Proof. • Group of jointly exhaustive events • Generalization of the law of total probability • Monty Holl Paradox Bayes’ Theorem • Bayes’ formula • Prior and posterior probabilities • Example from bookmaking company • Odds ratio • Conditional independence
  • Discrete and continuous random variables. Theirs numerical characteristics.
    Discrete random variables • Definition. Main properties • Probability mass function • Binomial random variable. Binomial formula in math analysis. • Geometric variable • Hypergeometric random variable • Poisson random variable. Rare events. Poisson limit theorem Numerical characteristics of random variables • Expectation. Variance. Standard deviation. Theirs properties. • K-th moment • Mode. Median. • St. Petersburg paradox Continuous Random Variables • Cumulative distribution function • Density of continuous random variable • Uniform, Gaussian (Normal), Exponential random variables • Formulas for expectation and variance • Independence and dependence of random variables. Joint distribution.
  • Normal distribution and Limit laws: CLT, LLN.
    Normal Distribution • Family of normal distributions • Standardization. Z-score • 65-95-99.7-rule (3-sigma rule) Limit theorems • CLT. Examples • LLN. Examples More limit theorems • CLT for proportions • De Moivre-Laplace theorem
  • Introduction to statistical analysis. Sample. Statistical population. Point estimation.
    Statistics. Preliminaries • Main definitions: population, sample, representative sample • Frequency histogram • Empirical distribution function • Probability vs Statistics Properties of Point estimates • Unbiasedness • Consistency • Quantiles. • Sample mean, sample variance, unbiased sample variance, sample proportion • Outliers • Correlation • Introductory seminar for R
  • Interval estimation
    Confidence intervals • Point estimates vs Interval estimates • Confidence interval for a mean (variance is known) • Student T-distribution • Confidence interval for a mean (variance is not known) • One-sided confidence intervals • Confidence intervals for proportions
  • Hypothesis testing
    Statistical Hypothesis Testing • Statistical hypothesis • Statistical test • Type 1 and Type 2 errors • Significance level. Rejection Region. • P-value Hypotheses for a mean of normally distributed data • One-sample Z-test. Connection with the CLT. • Two-sample T-test • Applications: A/B testing, Model validation problems, Double blind experiment • Connection with the confidence intervals Homogeneity hypothesis • Mann-Whitney rank test • Two-sample Kolmogorov test • Applications: A/B testing, Model validation problems, Double blind experiment
Assessment Elements

Assessment Elements

  • non-blocking Activity
  • non-blocking Test 1
  • non-blocking Test 2
  • non-blocking Class assignments
  • non-blocking Self-study report
  • non-blocking Final Exam
    Экзамен проводится в письменной форме с использованием асинхронного прокторинга. Экзамен проводится на платформе et.hse.ru moodle с прокторингом на платформе Экзамус (https://hse.student.examus.net). На платформе Экзамус доступно тестирование системы. Компьютер студента должен удовлетворять следующим требованиям: https://elearning.hse.ru/data/2020/05/07/1544135594/Технические%20требования%20к%20ПК%20студента.pdf Экзамен проходит в письменной форме 11 июня 2020 в 10:30. К экзамену необходимо подключиться за 15 минут. Экзамен длится 60 мин. Студентам предлагается 6-7 задач, загруженные в систему «Moodle». Студент должен предоставить полное решение и указать ответ к задаче. Для участия в экзамене студент обязан: - заранее зайти на платформу прокторинга (за неделю до экзамена), - провести тест системы (за неделю до экзамена), - включить камеру и микрофон, - подтвердить личность. Во время экзамена студентам запрещено: - общаться (в социальных сетях, с людьми в комнате), - пользоваться мобильным телефоном, - пользоваться поиском в интернете, википедией и пр., - пользоваться слайдами и видео лекций. Студентам разрешается пользоваться списком основных формул, выписанных на одном листе бумаги. Кратковременным нарушением связи во время экзамена считается прерывание связи до 10 минут. Долговременным нарушением связи во время экзамена считается прерывание связи 10 минут и более. При долговременном нарушении связи студент не может продолжить участие в экзамене. Он должен это зафиксировать, а именно сделать PrintScreen возникшей технической проблемы, и отослать в учебный офис. Процедура пересдачи подразумевает использование усложненных заданий.
Interim Assessment

Interim Assessment

  • Interim assessment (4 module)
    0.11 * Activity + 0.18 * Class assignments + 0.25 * Final Exam + 0.1 * Self-study report + 0.18 * Test 1 + 0.18 * Test 2
Bibliography

Bibliography

Recommended Core Bibliography

  • Deep, R. (2006). Probability and Statistics : With Integrated Software Routines. Amsterdam: Academic Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=196153
  • Young, G. A., & Smith, R. L. (2005). Essentials of Statistical Inference. Cambridge: Cambridge University Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=138968

Recommended Additional Bibliography

  • Bruce, P. C. (2014). Introductory Statistics and Analytics : A Resampling Perspective. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=923330