M08 – Hypothesis Testing: CFAI Practice Problems

Source: CFAI CFA1 Quant Practice 2026, pp.236–241 Back to module: M08 Glossary: M08 Terms

Question 1

An analyst suspects that a fund’s excess returns are less than 5%. The most appropriate hypotheses to test this are:

A. $H_{0} : μ = 5%$ vs. $H_{a} : μ \neq = 5%$
B. $H_{0} : μ \geq 5%$ vs. $H_{a} : μ < 5%$
C. $H_{0} : μ \leq 5%$ vs. $H_{a} : μ > 5%$

Answer

B. $H_{0} : μ \geq 5%$ vs. $H_{a} : μ < 5%$

The analyst’s suspicion (the claim she wants to establish) is that excess returns are less than 5%. In hypothesis testing, the condition the analyst wants to demonstrate evidence for is placed in the alternative hypothesis $H_{a}$ .

This is a one-tailed (left-tail) test because the alternative specifies a direction (less than).

Why A is wrong: A two-tailed test ( $H_{a} : μ \neq = 5%$ ) is appropriate when the analyst suspects the mean is different from 5% in either direction, with no prior directional belief. Here the analyst has a directional suspicion (less than), making a one-tailed test more powerful and appropriate.

Why C is wrong: $H_{a} : μ > 5%$ is a right-tail test — the opposite direction from the analyst’s suspicion.

Decision rule for B: Reject $H_{0}$ if the test statistic falls in the left tail, i.e., test statistic $< - t_{α}$ (or $< - z_{α}$ for large samples).

📖 Giải thích chi tiết

Ôn lại khái niệm: Trong kiểm định giả thuyết, điều analyst muốn chứng minh luôn được đặt vào $H_{a}$ (alternative hypothesis). $H_{0}$ là giả thuyết mặc định (status quo) — giả sử đúng cho đến khi có bằng chứng đủ mạnh để bác bỏ.

Tại sao B đúng: Analyst nghi ngờ excess returns nhỏ hơn 5% → đây là chiều analyst muốn chứng minh → $H_{a} : μ < 5%$ . Để $H_{0}$ và $H_{a}$ bao phủ toàn bộ giá trị có thể, $H_{0} : μ \geq 5%$ . Đây là one-tailed test (đuôi trái) vì $H_{a}$ chỉ định một chiều cụ thể. Tại sao A sai: $H_{a} : μ \neq = 5%$ là two-tailed test — chỉ dùng khi không có định hướng trước. Analyst ở đây có định hướng rõ ràng (nhỏ hơn), nên one-tailed test phù hợp hơn và mạnh hơn. Tại sao C sai: $H_{a} : μ > 5%$ là đuôi phải — ngược chiều với nghi ngờ của analyst (đuôi trái).

Question 2

Which of the following is correct about hypothesis testing?

A. The null hypothesis is the condition the researcher hopes to support
B. The alternative hypothesis is the proposition considered true without contrary evidence
C. The alternative hypothesis exhausts all parameter values not covered by the null hypothesis

Answer

C. The alternative hypothesis exhausts all parameter values not covered by the null hypothesis

Together, $H_{0}$ and $H_{a}$ must cover all possible values of the parameter — they are mutually exclusive and collectively exhaustive. Any value of the parameter that is not included in the null must be covered by the alternative.

Why A is wrong: The null hypothesis is what the researcher starts with as true (the default/status quo position). It is the alternative hypothesis that the researcher hopes to find evidence to support. Researchers design tests to reject $H_{0}$ in favour of $H_{a}$ .

Why B is wrong: The proposition “considered true without contrary evidence” is the null hypothesis — not the alternative. The null is maintained unless sufficient statistical evidence exists to reject it.

Summary of roles:

Null $H_{0}$ Alternative $H_{a}$
Default assumption Yes — assumed true initially No
Researcher’s goal To reject To support
Contains equality Always Never (uses $<$ , $>$ , or $\neq =$ )

	Null $H_{0}$	Alternative $H_{a}$
Default assumption	Yes — assumed true initially	No
Researcher’s goal	To reject	To support
Contains equality	Always	Never (uses $<$ , $>$ , or $\neq =$ )

📖 Giải thích chi tiết

Ôn lại khái niệm: Null hypothesis ( $H_{0}$ ) và alternative hypothesis ( $H_{a}$ ) phải cùng nhau bao phủ mọi giá trị có thể của tham số (collectively exhaustive) và không chồng lấn nhau (mutually exclusive).

Tại sao C đúng: $H_{a}$ chứa tất cả các giá trị không nằm trong $H_{0}$ . Ví dụ: nếu $H_{0} : μ = 5%$ thì $H_{a} : μ \neq = 5%$ bao gồm mọi giá trị còn lại — không sót trường hợp nào. Tại sao A sai: $H_{0}$ là giả thuyết mặc định/status quo mà researcher giả sử đúng ban đầu. Chính $H_{a}$ mới là điều researcher muốn tìm bằng chứng ủng hộ — A đã đảo ngược vai trò của hai giả thuyết. Tại sao B sai: “Được coi là đúng khi không có bằng chứng ngược lại” mô tả đúng $H_{0}$ , không phải $H_{a}$ . $H_{0}$ được duy trì trừ khi có bằng chứng thống kê đủ mạnh để bác bỏ.

Question 3

Which of the following is correct about the null hypothesis?

A. It can be a “not equal to” statement if the alternative is an “equal to” statement
B. Along with the alternative hypothesis, it covers all possible parameter values
C. In a two-tailed test, it is rejected when evidence supports equality of the parameter to the hypothesised value

Answer

B. Along with the alternative hypothesis, it covers all possible parameter values

The null and alternative hypotheses together form an exhaustive partition of the parameter space — every possible value of the parameter falls under exactly one of the two hypotheses.

Why A is wrong: The null hypothesis always includes the equality sign (e.g., $=$ , $\leq$ , or $\geq$ ). A null of “not equal to” is never used in standard hypothesis testing because the null represents the default or status quo position, which requires a specific (or boundary) value.

Why C is wrong: A two-tailed test rejects $H_{0}$ when there is strong evidence that the parameter differs from the hypothesised value (i.e., the test statistic falls far in either tail). When evidence supports equality — meaning the test statistic is close to zero — we fail to reject $H_{0}$ .

📖 Giải thích chi tiết

Ôn lại khái niệm: Null hypothesis phải luôn chứa dấu bằng (=, ≤, hoặc ≥) vì đó là giả thuyết mà ta cần một giá trị cụ thể để tính test statistic. $H_{0}$ và $H_{a}$ cùng nhau phải bao phủ toàn bộ tham số.

Tại sao B đúng: Đây là định nghĩa cơ bản — $H_{0}$ và $H_{a}$ cùng nhau tạo thành một phân vùng hoàn chỉnh của không gian tham số. Tại sao A sai: $H_{0}$ luôn phải chứa dấu bằng (=, ≤, ≥). Không bao giờ viết $H_{0}$ với dấu “khác” ( $\neq =$ ) vì ta không thể tính test statistic nếu $H_{0}$ không có giá trị cụ thể để so sánh. Tại sao C sai: Two-tailed test bác bỏ $H_{0}$ khi bằng chứng cho thấy tham số khác biệt với giá trị giả thuyết (test statistic nằm xa ở cả hai đuôi). Khi bằng chứng ủng hộ sự bằng nhau (test statistic gần 0), ta fail to reject $H_{0}$ , không phải reject.

Question 4

Regarding a one-tailed hypothesis test, which of the following is correct?

A. The rejection region increases in size as the significance level becomes smaller
B. A one-tailed test more strongly reflects the prior beliefs of the researcher than a two-tailed test
C. The absolute value of the critical value in a one-tailed test is larger than in a two-tailed test

Answer

B. A one-tailed test more strongly reflects the prior beliefs of the researcher than a two-tailed test

A one-tailed test embeds a directional hypothesis — the researcher has a prior belief about which direction the parameter deviates from the null. This is a stronger statement than a two-tailed test, which only asks “is it different?” without specifying direction.

Why A is wrong: A smaller significance level $α$ means a smaller rejection region (stricter standard for rejection), not a larger one. For example, $α = 0.01$ has a smaller critical region than $α = 0.05$ .

Why C is wrong: For the same $α$ , the absolute critical value in a one-tailed test is smaller than in a two-tailed test:

Test $α = 0.05$ critical value (normal)
One-tailed $\pm 1.645$
Two-tailed $\pm 1.960$

In a one-tailed test, all $α$ probability is in one tail, so the cutoff is less extreme than when $α /2$ is in each tail for a two-tailed test.

Test	$α = 0.05$ critical value (normal)
One-tailed	$\pm 1.645$
Two-tailed	$\pm 1.960$

📖 Giải thích chi tiết

Ôn lại khái niệm: Critical value là ngưỡng mà test statistic phải vượt qua để reject $H_{0}$ . Với one-tailed test, toàn bộ xác suất $α$ dồn vào một đuôi — critical value do đó gần với trung tâm phân phối hơn (dễ reject hơn) so với two-tailed test.

Tại sao B đúng: One-tailed test thể hiện niềm tin có định hướng của researcher — researcher đã có prior belief về chiều của hiệu ứng, điều mà two-tailed test không có. Tại sao A sai: Significance level nhỏ hơn → vùng rejection nhỏ hơn (tiêu chuẩn nghiêm ngặt hơn). $α = 0.01$ có vùng rejection nhỏ hơn $α = 0.05$ . Tại sao C sai: Với cùng $α$ , critical value tuyệt đối của one-tailed test nhỏ hơn của two-tailed test. Ví dụ $α = 0.05$ : one-tailed $z = 1.645$ < two-tailed $z = 1.960$ . One-tailed test dồn toàn bộ $α$ vào một đuôi nên cutoff gần trung tâm hơn.

Question 5

A test uses a 5% significance level. The corresponding confidence level is:

A. 2.5%
B. 5%
C. 95%

Answer

C. 95%

The confidence level is the complement of the significance level:

$Confidence level = 1 - α = 1 - 0.05 = 0.95 = 95%$

Intuition: A 5% significance level means we accept a 5% probability of incorrectly rejecting a true null hypothesis (Type I error). We are therefore “95% confident” in the non-rejection region. The corresponding 95% confidence interval for the parameter will include the null value exactly when the hypothesis test fails to reject at the 5% level.

📖 Giải thích chi tiết

Ôn lại khái niệm: Significance level ( $α$ ) và confidence level là hai mặt của cùng một đồng xu: confidence level = $1 - α$ . Hai khái niệm này liên kết chặt chẽ — confidence interval và hypothesis test cho kết quả nhất quán với nhau.

Tại sao C đúng: $1 - α = 1 - 0.05 = 0.95 = 95%$ . Confidence level 95% có nghĩa là nếu lấy mẫu nhiều lần, 95% các confidence interval sẽ chứa giá trị tham số thực. Tại sao A sai: 2.5% là $α /2$ — phần xác suất ở mỗi đuôi của two-tailed test, không phải confidence level. Tại sao B sai: 5% là chính significance level $α$ , không phải confidence level. Confidence level luôn là phần bù của $α$ .

Question 6

A hypothesis test for a normal population at a 0.05 significance level implies:

A. a 95% probability of rejecting a true null hypothesis
B. a 95% Type I error for a two-tailed test
C. a 5% critical value rejection region for a one-tailed test

Answer

C. a 5% critical value rejection region for a one-tailed test

At significance level $α = 0.05$ , the rejection region contains exactly 5% of the probability under the null. For a one-tailed test, all 5% is placed in one tail (left or right), defined by the critical value (e.g., $z = - 1.645$ for a left-tailed test).

Why A is wrong: The significance level defines the probability of rejecting a true null (Type I error) as 5%, not 95%. A 95% probability of rejection would correspond to $α = 0.95$ , which is an absurdly large Type I error.

Why B is wrong: The Type I error probability is $α = 5%$ , not 95%. For a two-tailed test, this 5% is split as 2.5% in each tail — but the total Type I error probability remains 5%.

📖 Giải thích chi tiết

Ôn lại khái niệm: Significance level $α = 0.05$ nghĩa là xác suất phạm Type I error (reject $H_{0}$ đúng) là 5%. Rejection region chứa đúng $α = 5%$ diện tích phân phối dưới $H_{0}$ .

Tại sao C đúng: Với $α = 0.05$ , one-tailed test có rejection region là 5% ở một đuôi — được xác định bởi critical value (ví dụ $z = - 1.645$ cho đuôi trái). Đây là mô tả chính xác. Tại sao A sai: Xác suất reject $H_{0}$ đúng là 5% (= $α$ ), không phải 95%. Xác suất 95% là để không reject $H_{0}$ đúng (= $1 - α$ ). Tại sao B sai: Type I error probability là $α = 5%$ , không phải 95%. Với two-tailed test, 5% được chia đều thành 2.5% mỗi đuôi, nhưng tổng vẫn là 5%.

Question 7

A test statistic is best described as the basis for deciding whether to:

A. reject the null hypothesis
B. accept the null hypothesis
C. reject the alternative hypothesis

Answer

A. reject the null hypothesis

The test statistic is a standardised measure computed from sample data that is compared against a critical value (or used to compute a p-value) to determine whether to reject $H_{0}$ . The decision rule is: reject $H_{0}$ if the test statistic falls in the rejection region.

Why B is wrong: In formal hypothesis testing, we never “accept” $H_{0}$ — we only fail to reject it. Failing to reject does not mean $H_{0}$ is true; it means there is insufficient evidence to conclude it is false.

Why C is wrong: The test statistic is used to evaluate the null, not the alternative. Rejecting $H_{0}$ provides evidence in favour of $H_{a}$ , but the decision is framed as “reject or fail to reject $H_{0}$ .”

📖 Giải thích chi tiết

Ôn lại khái niệm: Test statistic là giá trị tính từ dữ liệu mẫu, được chuẩn hóa theo phân phối chuẩn. Nó là cầu nối giữa dữ liệu quan sát và quyết định thống kê: so sánh với critical value hoặc dùng để tính p-value.

Tại sao A đúng: Test statistic là cơ sở để quyết định có reject $H_{0}$ hay không — so với critical value (nếu vượt quá → reject) hoặc p-value (nếu < $α$ → reject). Tại sao B sai: Trong kiểm định giả thuyết, ta không bao giờ “accept” $H_{0}$ — chỉ có “fail to reject” (không đủ bằng chứng bác bỏ). Fail to reject không có nghĩa $H_{0}$ đúng, chỉ là bằng chứng chưa đủ mạnh. Tại sao C sai: Test statistic dùng để đánh giá $H_{0}$ , không phải $H_{a}$ . Khi reject $H_{0}$ ta có bằng chứng ủng hộ $H_{a}$ , nhưng framework luôn là “reject hay fail to reject $H_{0}$ ”.

Question 8

A Type I error is best described as:

A. rejecting a true null hypothesis
B. rejecting a false null hypothesis
C. failing to reject a false null hypothesis

Answer

A. rejecting a true null hypothesis

A Type I error (also called a false positive) occurs when the null hypothesis is actually true but the test incorrectly rejects it. The probability of committing a Type I error is $α$ , the significance level.

Error taxonomy:

$H_{0}$ is True $H_{0}$ is False
Reject $H_{0}$ Type I error ( $α$ ) Correct decision (Power = $1 - β$ )
Fail to reject $H_{0}$ Correct decision ( $1 - α$ ) **[[quantitative-methods/glossary/m08-hypothesis-testing#type-ii-error

Why B is wrong: Correctly rejecting a false null is the ideal outcome — this is the power of the test ( $1 - β$ ), not an error.

Why C is wrong: Failing to reject a false null is a Type II error ( $β$ ), not a Type I error.

	$H_{0}$ is True	$H_{0}$ is False
Reject $H_{0}$	Type I error ( $α$ )	Correct decision (Power = $1 - β$ )
Fail to reject $H_{0}$	Correct decision ( $1 - α$ )	**[[quantitative-methods/glossary/m08-hypothesis-testing#type-ii-error

📖 Giải thích chi tiết

Ôn lại khái niệm: Ma trận lỗi trong kiểm định giả thuyết:

Type I error (false positive, $α$ ): reject $H_{0}$ khi $H_{0}$ đúng

Type II error (false negative, $β$ ): fail to reject $H_{0}$ khi $H_{0}$ sai

Tại sao A đúng: Type I error = reject $H_{0}$ đúng = “false alarm” — ta kết luận có hiệu ứng trong khi thực ra không có. Xác suất này là $α$ (significance level). Tại sao B sai: Reject $H_{0}$ sai là quyết định đúng — đây chính là power của test ( $1 - β$ ), không phải lỗi. Tại sao C sai: Fail to reject $H_{0}$ sai là Type II error ( $β$ ) — ta bỏ lỡ hiệu ứng thực sự có tồn tại. Đây không phải Type I error.

Question 9

A Type II error is best described as:

A. rejecting a true null hypothesis
B. failing to reject a false null hypothesis
C. failing to reject a false alternative hypothesis

Answer

B. failing to reject a false null hypothesis

A Type II error (also called a false negative) occurs when $H_{0}$ is actually false but the test fails to reject it. The probability of a Type II error is $β$ , and the power of the test is $1 - β$ .

Why A is wrong: Rejecting a true null is a Type I error ( $α$ ).

Why C is wrong: “Failing to reject a false alternative” is not standard terminology — hypothesis tests are framed around rejecting or failing to reject $H_{0}$ , not $H_{a}$ . The concept described in C is not a recognised error type in classical hypothesis testing.

Reducing Type II error: Increase sample size $n$ (increases power), raise significance level $α$ (but this increases Type I error), or increase the true effect size.

📖 Giải thích chi tiết

Ôn lại khái niệm: Type II error ( $β$ ) là “bỏ lỡ” — fail to reject $H_{0}$ khi $H_{0}$ thực ra sai. Đây là lỗi “false negative”. Power = $1 - β$ đo lường khả năng phát hiện $H_{0}$ sai.

Tại sao B đúng: Fail to reject $H_{0}$ sai = Type II error. Ví dụ: một loại thuốc thực sự hiệu quả nhưng test không phát hiện được — đây là Type II error. Tại sao A sai: Reject $H_{0}$ đúng là Type I error ( $α$ ), không phải Type II. Tại sao C sai: “Fail to reject a false alternative hypothesis” không phải thuật ngữ chuẩn — framework kiểm định luôn xoay quanh $H_{0}$ , không phải $H_{a}$ . Không có khái niệm lỗi liên quan đến việc reject/fail to reject $H_{a}$ .

Question 10

The significance level is best used to:

A. calculate the test statistic
B. define the test’s rejection points (critical values)
C. specify the probability of a Type II error

Answer

B. define the test’s rejection points (critical values)

The significance level $α$ determines the critical values that separate the rejection region from the non-rejection region. For example, $α = 0.05$ for a two-tailed $z$ -test gives critical values $\pm 1.96$ — the rejection region is $∣ z ∣ > 1.96$ .

Why A is wrong: The test statistic is calculated from the sample data using the sample mean, hypothesised value, standard error, and degrees of freedom — not from the significance level.

Why C is wrong: The significance level $α$ specifies the probability of a Type I error, not Type II. The probability of a Type II error ( $β$ ) depends on the true parameter value, sample size, and $α$ , but is not directly specified by $α$ .

📖 Giải thích chi tiết

Ôn lại khái niệm: Critical value được xác định bởi $α$ và phân phối của test statistic (t, z, F, $χ^{2}$ ). Đây là bước “stating the decision rule” trong quy trình kiểm định — phải thực hiện trước khi thu thập dữ liệu.

Tại sao B đúng: Significance level $α$ xác định vùng rejection và do đó xác định critical value — ngưỡng mà test statistic phải vượt qua. Ví dụ $α = 0.05$ two-tailed z-test → critical values $\pm 1.96$ . Tại sao A sai: Test statistic tính từ dữ liệu mẫu ( $\overset{ˉ}{X}$ , $s$ , $n$ , và giá trị giả thuyết), không phải từ significance level. Tại sao C sai: $α$ xác định xác suất Type I error, không phải Type II error. Xác suất Type II error ( $β$ ) phụ thuộc vào giá trị tham số thực, $n$ , và $α$ — nhưng không được trực tiếp xác định bởi $α$ .

Question 11

The probability of correctly rejecting a false null hypothesis is:

A. the p-value
B. the power of the test
C. the level of significance

Answer

B. the power of the test

The power of a hypothesis test is defined as:

$Power = 1 - β = P (Reject H_{0} ∣ H_{0} is false)$

where $β$ is the probability of a Type II error. Power measures the test’s ability to detect a false null hypothesis.

Why A is wrong: The p-value is the probability of observing a test statistic at least as extreme as the one calculated, assuming $H_{0}$ is true. It is used to evaluate evidence against $H_{0}$ , not to measure the test’s ability to detect falseness.

Why C is wrong: The level of significance $α$ is the probability of incorrectly rejecting a true $H_{0}$ (Type I error) — the opposite of what the question asks.

Factors that increase power: larger sample size, higher $α$ , larger true effect size, lower population variability.

📖 Giải thích chi tiết

Ôn lại khái niệm: Power = $P (Reject H_{0} ∣ H_{0} sai) = 1 - β$ . Power đo lường khả năng phát hiện $H_{0}$ sai — test tốt phải có power cao. Power phụ thuộc vào: sample size $n$ , $α$ , effect size thực, và variance của population.

Tại sao B đúng: $β$ = xác suất Type II error. Power = $1 - β$ = xác suất không phạm Type II error = xác suất reject đúng khi $H_{0}$ sai. Tại sao A sai: Power và significance level là hai khái niệm khác nhau: $α$ = Type I error rate, power = $1 -$ Type II error rate. Tăng $α$ thì power tăng, nhưng chúng không bằng nhau. Tại sao C sai: Tăng sample size $n$ tăng power — mẫu lớn hơn cung cấp nhiều thông tin hơn, giảm sampling error, dễ phát hiện $H_{0}$ sai hơn. Đây là lý do các nghiên cứu đủ mạnh cần cỡ mẫu đủ lớn.

Question 12

The power of a hypothesis test is:

A. equivalent to the significance level
B. the probability of not making a Type II error
C. unchanged by increasing the sample size

Answer

B. the probability of not making a Type II error

Power $= 1 - β = P (Reject H_{0} ∣ H_{0} false)$ . Since $β$ is the probability of a Type II error (failing to reject a false null), power is precisely the probability of avoiding a Type II error.

Why A is wrong: Power and significance level are different concepts that trade off against each other. Increasing $α$ (significance level) does increase power, but they are not equivalent — significance level is the Type I error rate while power is $1 -$ Type II error rate.

Why C is wrong: Increasing sample size $n$ increases power. A larger sample provides more information about the population, reducing sampling error and making it easier to detect a false null hypothesis. This is why large samples can detect even very small, practically insignificant effects.

📖 Giải thích chi tiết

Ôn lại khái niệm: Power = $1 - β$ = xác suất reject $H_{0}$ khi $H_{0}$ sai. Power là thước đo hiệu quả của test — test tốt phải có power cao. Power tăng khi: (1) $n$ tăng, (2) $α$ tăng, (3) effect size thực lớn hơn, (4) variance thấp hơn.

Tại sao B đúng: $β$ = xác suất Type II error. Power = $1 - β$ = xác suất không phạm Type II error. Đây là định nghĩa trực tiếp nhất của power. Tại sao A sai: Power và significance level là hai khái niệm khác nhau: $α$ = Type I error rate (reject $H_{0}$ đúng), power = $1 - β$ = tránh Type II error. Tăng $α$ có thể tăng power, nhưng chúng không bằng nhau. Tại sao C sai: Tăng sample size $n$ tăng power — mẫu lớn hơn giảm sampling error, dễ phát hiện $H_{0}$ sai hơn. Đây là lý do các study cần power analysis để xác định $n$ tối thiểu.

Question 13

In the “stating the decision rule” step of hypothesis testing, the analyst must specify:

A. the critical value
B. the power of the test
C. the value of the test statistic

Answer

A. the critical value

The decision rule specifies the critical value(s) — the threshold(s) against which the test statistic will be compared. The decision rule states: “Reject $H_{0}$ if [test statistic] exceeds [critical value] in absolute value (two-tailed) or in one direction (one-tailed).”

The critical value is determined by the significance level $α$ , the distribution of the test statistic (e.g., $t$ , $z$ , $F$ , $χ^{2}$ ), and the degrees of freedom.

Why B is wrong: Power is not part of the decision rule — it is a property of the test that can be computed but is not stated as part of the formal decision procedure.

Why C is wrong: The test statistic is computed from the sample data after data collection. The decision rule is stated before computing the test statistic, specifying what value the test statistic must exceed to reject $H_{0}$ .

📖 Giải thích chi tiết

Ôn lại khái niệm: Quy trình kiểm định giả thuyết gồm các bước: (1) Đặt giả thuyết, (2) Chọn test và significance level, (3) Xác định decision rule (critical value), (4) Thu thập dữ liệu và tính test statistic, (5) Kết luận.

Tại sao A đúng: Bước “stating the decision rule” yêu cầu xác định critical value — ngưỡng quyết định trước khi xem dữ liệu. Critical value phụ thuộc vào $α$ , loại test statistic, và degrees of freedom. Tại sao B sai: Power không phải là một phần của decision rule — đó là thuộc tính của test có thể tính được nhưng không được phát biểu trong quy tắc quyết định chính thức. Tại sao C sai: Test statistic được tính từ dữ liệu mẫu sau khi thu thập dữ liệu. Decision rule phải được xác định trước khi tính test statistic để tránh bias trong kết luận.

Question 14

A pooled estimator is used when testing the:

A. equality of two population variances
B. difference in means of two populations with unknown but assumed equal variances
C. difference in means of two populations with unknown and unequal variances

Answer

B. difference in means of two populations with unknown but assumed equal variances

When testing $H_{0} : μ_{1} = μ_{2}$ and the population variances are unknown but assumed equal ( $σ_{1}^{2} = σ_{2}^{2}$ ), the two sample variances are pooled into a single estimate:

$s_{p}^{2} = \frac{( n _{1} - 1 ) s _{1}^{2} + ( n _{2} - 1 ) s _{2}^{2}}{n _{1} + n _{2} - 2}$

This pooled variance is used in the t-test statistic:

$t = \frac{( X ˉ _{1} - X ˉ _{2} ) - ( μ _{1} - μ _{2} ) _{0}}{s _{p} \frac{1}{n _{1}} + \frac{1}{n _{2}}}$

with $df = n_{1} + n_{2} - 2$ .

Why A is wrong: Testing equality of variances uses the F-test (ratio of two sample variances), not a pooled estimator.

Why C is wrong: When variances are unequal (Welch’s $t$ -test), the two sample variances are not pooled — they are kept separate, and the degrees of freedom are adjusted (Welch-Satterthwaite approximation).

📖 Giải thích chi tiết

Ôn lại khái niệm: Pooled t-test dùng khi: (1) so sánh trung bình hai tổng thể độc lập, (2) variance chưa biết nhưng giả định bằng nhau. Pooled variance kết hợp thông tin từ cả hai mẫu để ước lượng chính xác hơn variance chung.

Tại sao B đúng: Khi $σ_{1}^{2} = σ_{2}^{2}$ (unknown), ta gộp hai sample variance thành một ước lượng chung: $s_{p}^{2} = \frac{( n _{1} - 1 ) s _{1}^{2} + ( n _{2} - 1 ) s _{2}^{2}}{n _{1} + n _{2} - 2}$ . Đây là trường hợp dùng pooled estimator. Tại sao A sai: Testing equality of variances dùng F-test (tỷ số hai sample variances), không phải pooled estimator. Tại sao C sai: Khi variances không bằng nhau (unequal), dùng Welch’s t-test — hai variances được giữ riêng, không gộp lại. Degrees of freedom được điều chỉnh bằng Welch-Satterthwaite approximation.

Question 15

For evaluating the mean differences of two dependent (paired) samples, the most appropriate test is:

A. z-test
B. chi-square test
C. paired comparisons test

Answer

C. paired comparisons test

When two samples are dependent (i.e., each observation in one sample is naturally paired with an observation in the other — e.g., before/after measurements on the same subject, or matched pairs), the appropriate test is the paired comparisons t-test.

Procedure:

Compute the difference $d_{i} = X_{1 i} - X_{2 i}$ for each pair

Test $H_{0} : μ_{d} = 0$ (or some other value) using: $t = \frac{d ˉ - μ _{d, 0}}{s _{d} / n}$ with $df = n - 1$ (where $n$ = number of pairs)

Why A is wrong: A z-test requires known population variances or very large samples, and does not account for the paired structure of the data.

Why B is wrong: The chi-square test is used for testing a single population variance or independence/goodness of fit — not for comparing means of paired samples.

📖 Giải thích chi tiết

Ôn lại khái niệm: Paired comparisons t-test (kiểm định cặp đôi) dùng khi hai mẫu phụ thuộc nhau — mỗi quan sát trong mẫu 1 được ghép cặp tự nhiên với một quan sát trong mẫu 2. Ví dụ: đo lường trước/sau điều trị trên cùng một đối tượng.

Tại sao C đúng: Dữ liệu phụ thuộc (paired) yêu cầu paired comparisons t-test. Cách làm: tính $d_{i} = X_{1 i} - X_{2 i}$ cho từng cặp, rồi kiểm định $H_{0} : μ_{d} = 0$ với $t = \overset{ˉ}{d} / (s_{d} / n)$ , $df = n - 1$ . Tại sao A sai: Z-test yêu cầu variance đã biết hoặc mẫu rất lớn, và không xử lý cấu trúc paired của dữ liệu. Tại sao B sai: Chi-square test dùng để kiểm định variance của một tổng thể hoặc independence/goodness of fit — không dùng để so sánh trung bình của paired samples.

Question 16

A chi-square test is most appropriate for testing:

A. a single population variance
B. the difference in means of two populations with equal variances
C. the difference in means of two populations with unequal variances

Answer

A. a single population variance

The chi-square ( $χ^{2}$ ) distribution arises naturally when testing hypotheses about a single population variance. The test statistic is:

$χ^{2} = \frac{( n - 1 ) s ^{2}}{σ _{0}^{2}}$

with $df = n - 1$ , where $s^{2}$ is the sample variance and $σ_{0}^{2}$ is the hypothesised population variance.

Other uses of chi-square: Testing independence in contingency tables and goodness-of-fit tests.

Why B is wrong: Testing the difference in means with equal variances uses the pooled t-test.

Why C is wrong: Testing the difference in means with unequal variances uses Welch’s t-test.

📖 Giải thích chi tiết

Ôn lại khái niệm: Chi-square test ( $χ^{2}$ ) với variance của một tổng thể dùng test statistic: $χ^{2} = \frac{( n - 1 ) s ^{2}}{σ _{0}^{2}}$ , phân phối chi-square với $df = n - 1$ . Phân phối $χ^{2}$ chỉ nhận giá trị dương và lệch phải.

Tại sao A đúng: Chi-square test phù hợp nhất để kiểm định variance của một tổng thể — ví dụ $H_{0} : σ^{2} = σ_{0}^{2}$ . Đây là ứng dụng cốt lõi của chi-square trong thống kê suy luận. Tại sao B sai: So sánh trung bình hai tổng thể với equal variances dùng pooled t-test. Tại sao C sai: So sánh trung bình hai tổng thể với unequal variances dùng Welch’s t-test.

Question 17

To test the difference between the variances of two normal populations, the most appropriate test is:

A. t-test
B. F-test
C. paired comparisons test

Answer

B. F-test

The $F$ -statistic is the ratio of two sample variances, and follows an $F$ -distribution under the null hypothesis of equal population variances:

$F = \frac{s _{1}^{2}}{s _{2}^{2}}$

where $s_{1}^{2} \geq s_{2}^{2}$ by convention (so $F \geq 1$ ). The test has degrees of freedom $(n_{1} - 1, n_{2} - 1)$ .

$H_{0} : σ_{1}^{2} = σ_{2}^{2}$ vs. $H_{a} : σ_{1}^{2} \neq = σ_{2}^{2}$ (two-tailed)

Why A is wrong: The t-test is used for testing hypotheses about means (one sample or two samples), not variances.

Why C is wrong: The paired comparisons t-test is used for testing mean differences in dependent samples, not for comparing variances.

📖 Giải thích chi tiết

Ôn lại khái niệm: F-test so sánh hai variances bằng cách tính tỷ số $F = s_{1}^{2} / s_{2}^{2}$ (quy ước đặt variance lớn hơn ở tử số để $F \geq 1$ ). Phân phối F có hai bộ degrees of freedom: $(n_{1} - 1, n_{2} - 1)$ .

Tại sao B đúng: F-statistic = tỷ số hai sample variances, dùng để kiểm định $H_{0} : σ_{1}^{2} = σ_{2}^{2}$ . Phân phối F (Fisher) sinh ra tự nhiên từ tỷ số hai chi-square distributions. Tại sao A sai: T-test dùng để kiểm định trung bình (một mẫu hoặc hai mẫu), không phải variances. Tại sao C sai: Paired comparisons t-test dùng cho mean differences của dependent samples — không liên quan đến so sánh variances.

Question 18

A nonparametric test is most appropriate when the:

A. data consist of ranked values
B. test’s validity depends on many assumptions
C. sample is large and drawn from a possibly non-normal population

Answer

A. data consist of ranked values

Nonparametric tests are designed for ordinal (ranked) data or situations where the data do not meet the distributional assumptions required by parametric tests. When data are expressed as ranks rather than precise numerical measurements, parametric tests (which assume interval or ratio-scale data) are inappropriate.

Common situations for nonparametric tests:

Data are ordinal (ranked)

Population is heavily non-normal and sample is small

Data contain outliers that distort parametric results

No strong parametric model is available

Why B is wrong: Parametric tests require many distributional assumptions (normality, known variance, etc.). Nonparametric tests are chosen when these assumptions cannot be met — but the reason for using them is not that “the test’s validity depends on many assumptions” (that describes parametric tests themselves).

Why C is wrong: For large samples from a non-normal population, the Central Limit Theorem applies, and parametric tests (particularly the z-test) are valid. Nonparametric tests are most needed for small samples where the CLT does not apply.

📖 Giải thích chi tiết

Ôn lại khái niệm: Nonparametric tests phù hợp khi: dữ liệu là ordinal (ranked), hoặc tổng thể không chuẩn với mẫu nhỏ, hoặc dữ liệu có outliers làm sai lệch kết quả parametric. Nonparametric tests ít giả định hơn nhưng thường kém powerful hơn parametric tests khi giả định được thỏa mãn.

Tại sao A đúng: Dữ liệu ranked (ordinal) không phù hợp với parametric tests vốn giả định dữ liệu interval/ratio. Nonparametric tests như Spearman rank correlation hay Mann-Whitney U được thiết kế cho dữ liệu rank. Tại sao B sai: Mẫu từ tổng thể chuẩn → giả định của parametric tests được thỏa mãn → parametric tests phù hợp hơn (có higher power). Tại sao C sai: Parametric tests mới là loại phụ thuộc nhiều giả định phân phối. Nonparametric tests được chọn khi các giả định đó không thể biện minh, không phải vì “validity phụ thuộc nhiều giả định”.

Question 19

A nonparametric test is most likely used when the:

A. sample data are ranked by magnitude
B. sample is drawn from a normal population
C. test’s validity depends on many population assumptions

Answer

A. sample data are ranked by magnitude

This question reinforces the same core concept as Question 18. When data are expressed as ranks (ordinal scale) rather than exact values, nonparametric tests such as the Spearman rank correlation, Mann-Whitney $U$ , or Wilcoxon signed-rank test are appropriate.

Why B is wrong: If the sample comes from a normal population, the distributional assumptions for parametric tests (such as the t-test) are satisfied, making parametric tests preferable due to their greater power.

Why C is wrong: Parametric tests are the ones whose validity depends on distributional assumptions. Nonparametric tests make fewer assumptions — they are chosen precisely when those parametric assumptions cannot be justified.

📖 Giải thích chi tiết

Ôn lại khái niệm: Câu hỏi này củng cố Q18 với một góc nhìn khác: khi dữ liệu là ranked by magnitude (được sắp xếp theo thứ tự), ta chỉ biết thứ hạng chứ không biết giá trị chính xác — parametric tests không phù hợp vì cần giá trị numerical.

Tại sao A đúng: Dữ liệu ranked là dữ liệu ordinal — chỉ biết “A > B > C” nhưng không biết khoảng cách giữa chúng. Nonparametric tests như Spearman rank correlation, Mann-Whitney U, Wilcoxon signed-rank được thiết kế để xử lý loại dữ liệu này. Tại sao B sai: Tổng thể chuẩn → thỏa mãn giả định của parametric tests → nên dùng parametric tests (powerful hơn). Tại sao C sai: Parametric tests mới là loại có validity phụ thuộc vào nhiều giả định phân phối (normality, known variance…). Nonparametric tests ít giả định hơn — được chọn chính xác khi những giả định parametric đó không thể biện minh.

Question 20

Two funds have non-normal return distributions. An analyst has 1 year of monthly data (12 observations) and wants to test whether the mean return of one fund is greater than the other. The most appropriate test is:

A. parametric tests only
B. nonparametric tests only
C. either parametric or nonparametric tests

Answer

B. nonparametric tests only

This scenario has two characteristics that require a nonparametric approach:

Non-normal distributions: The standard parametric two-sample t-test requires approximately normal populations (or large samples for CLT to apply).

Small sample size ( $n = 12$ ): With only 12 monthly observations per fund, the sample is too small for the Central Limit Theorem to reliably normalise the sampling distribution of the mean.

With non-normal returns and a small sample, the assumptions of parametric tests are violated, making nonparametric tests (such as the Mann-Whitney $U$ test for comparing two independent group means) the only appropriate choice.

Why A is wrong: Parametric tests require either normality or large samples. Neither condition holds here.

Why C is wrong: Because the parametric test assumptions are clearly violated (non-normal + small $n$ ), parametric tests are not appropriate. The choice is not arbitrary.

Appropriate test: The Mann-Whitney $U$ test (also called the Wilcoxon rank-sum test) compares the central tendency of two independent groups without assuming normality. It uses ranked data and is valid for small samples.

📖 Giải thích chi tiết

Ôn lại khái niệm: Khi chọn test thống kê, cần đánh giá hai yếu tố: (1) Phân phối của tổng thể (chuẩn hay không?), (2) Cỡ mẫu (lớn đủ để CLT áp dụng?). Nếu cả hai đều không thuận lợi → chỉ còn nonparametric tests.

Tại sao B đúng: Hai điều kiện cùng lúc vi phạm giả định parametric: (1) non-normal distributions + (2) $n = 12$ (mẫu nhỏ, CLT chưa đủ mạnh). Với non-normal + small sample, Mann-Whitney U test là lựa chọn duy nhất phù hợp để so sánh trung bình hai nhóm độc lập. Tại sao A sai: Parametric tests yêu cầu normality hoặc mẫu lớn. Ở đây cả hai điều kiện đều không thỏa mãn. Tại sao C sai: Vì parametric tests rõ ràng không phù hợp (non-normal + small $n$ ), lựa chọn không phải tùy ý. Chỉ nonparametric tests mới hợp lệ — C sai khi nói “either” (cái nào cũng được).

Wiki Hub

Explorer

m08-hypothesis-testing

M08 – Hypothesis Testing: CFAI Practice Problems

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Question 11

Question 12

Question 13

Question 14

Question 15

Question 16

Question 17

Question 18

Question 19

Question 20

Graph View

Table of Contents