Evidence-based
practice is a method of incorporating the best available evidence to support an
intervention (physiotherapy, medical, nutritional, etc). This involves
utilizing the scientific literature as well as clinical/field experience.
However, this does not suggest that we should incorporate the literature
blindly. We must critically appraise the literature by assessing its internal
and external validity.

Internal validity reflects the quality of the study design,
implementation, and data analysis in order to minimize the level of bias and
determine a true ‘cause and effect’ relationship between an intervention and an
outcome (1). External validity
describes the circumstances under which the results of the research can be
generalized (the population of interest).

Please note that the majority
of the sections listed below are components of the PEDro Scale. There are a few
additional sections described below that may add to the quality of a critical
appraisal. In no way is the list complete, as there are many factors to
consider when critically appraising the literature

**.****Literature review**

An adequate literature review provides a thorough background
of the topic in the introduction. It should include previous research conducted
on the topic and a reason why the current study is being conducted.

**Participant selection**

The participants included in a study are considered a sample
of the population of interest. There are two forms of sampling that can be used
to select participants, probability sampling and nonprobability sampling (

**3**). The type of sampling can influence the characteristics of the sample, which in turn, influence the generalizability, or external validity (**2**).**Eligibility criteria**

This is a list of criteria used to determine who was
eligible to participate in the study (

**5**). This affects external validity, but not internal or statistical validity (**5**). The more homogeneous the sample based on the inclusion/exclusion criteria, the lower the external validity (**5**).**Sample size**

This is the number of subjects included in the study from a
given population. Larger sample sizes tend to be more representative of the
population.

**Randomization**

Participants are randomly assigned to one of two or more
interventions. Randomization minimizes the risk of confounding variables,
reduces the risk of bias, and allows for examination of direct relationships. A
major limitation of using an RCT is that it is impossible to obtain a true
random sample of the population (

**2**). This can make it difficult to generalize the results.**Concealed allocation**

The person responsible for determining whether a subject was
eligible for inclusion in the trial was unaware to which group the subject
would be allocated (

**5**). If allocation is not concealed, the decision about whether or not to include a person in a trial could be influenced by knowledge of whether the subject was to receive treatment or not. This could produce systematic biases in otherwise random allocation (**5**).**Blinding**

This is a method used to prevent study participants, as well
as those collecting and analyzing data from knowing who is in the intervention
group and who is in the control group. (

**1**). When subjects are blinded, it is less likely that the results of treatment are due to a placebo effect (**5**). Blinding assessors prevents their personal bias from affecting the results (**5**).**Baseline comparability**

Baseline comparability involves a comparison of the baseline
values of the groups (intervention and control). There should be no statistically
significant difference between groups. An appropriate randomization should
ensure that groups are similar at baseline. This may provide an indication of
potential bias arising by chance with random allocation (

**5**). A significant difference between groups may indicate an issue with the randomization procedures (**5**).**Outcomes**

The outcome measure is the method used to assess the effect
of the intervention. The purpose of an outcome measure is to discriminate among
subjects at one point in time, predict a subsequent event or outcome, and
assess change over time.

__Standardized outcome measures__

1.
Have explicit instructions for
administering, scoring, and interpreting results.

2. They
are supported to the extent that information concerning their measurement
properties has been estimated, reported, and defended in the peer-reviewed
literature

3. The
outcome measures is valid, reliable, sensitive and specific

__Measures of validity__

1.
The outcome measure assesses what it is
intended to measure (face validity)

2. The
outcome measure is appropriate for the population of interest (content
validity)

3. The
outcome measures provide results that are consistent with the gold standard
(criterion validity)

__Measures of reliability__

1. Multiple
assessments of one individual will provide consistent results
(test-retest/absolute reliability), among assessors (inter-rater reliability)
and within the same assessor (intra-rater reliability)

2. The
outcome measure can determine the degree to which the condition exists
(relative reliability)

Sensitivity is the ability of a test to reliably detect the
presence of a condition (

**3**). Therefore, if the test is negative, the subject will not have the condition (true positives/total positive results; SNOUT rules out).
Specificity is the ability of a test to reliably detect the
absence of condition (

**3**). Therefore, if the test is positive, the subject will have the condition (true negatives/total negative results; SPIN rules in).**Intervention**

The intervention should be described in enough detail for
reproducibility. An inadequate description decreases internal validity as it is
unclear of the exact mechanism that led to the change in outcomes.

**Adequate follow-up**

The number of subjects who completed the trial to provide
follow-up data for statistical analysis must be sufficient. The PEDro group
states that data collected from a minimum of 85% of subjects increases internal
validity (

**5**). It is important that measurements of outcome are made on all subjects who are randomized to groups. Subjects who are not followed up may differ systematically from those who are, and this potentially introduces bias. The magnitude of the potential bias increases with the proportion of subjects not followed up (**5**).**Intention-to-treat analysis**

This is a strategy that ensures that all patients allocated
to either the treatment or control groups are analysed together as representing
that treatment arm whether or not they received the prescribed treatment or
completed the study (

**1**). When patients are excluded from the analysis, the main rationale for randomization is defeated, leading to potential bias (**5**).**Between-group comparisons**

This comparison is a statistical comparison of one group
with another. It is performed to determine if the difference between groups is
greater than can plausibly be attributed to chance (

**5**).**Point estimates (effect size) and variability**

__Point Estimates__

A point estimate or effect size is a value that represents the most likely estimate of the true population (

**4**). Some examples include the mean difference, regression coefficient, Cohen’s d, and correlation coefficient.

__Variability__

It is important to consider the variability of the effect size (point estimate). A few examples of variability include: the standard deviation, the standard error, and a range of values. The standard deviation is an estimate of the degree of scatter (variability) of individual sample data points about the mean sample (

**7**). The standard error is the standard deviation of the test statistics (point estimate) obtained from all the samples randomly drawn from the population (

**3**). It is used as a quantification of variability of mean values, which is calculated to help derive confidence intervals. Since the standard deviation is always greater than the standard error, authors sometimes present data as “Mean ± SEM” instead of a “Mean ± SD”. This is an under-estimate of the true variability and can misdirect the reader.

__Confidence Intervals__

It is common for variability to be displayed in a confidence interval (CI). A confidence interval is the range of values that encompasses the population, with a given probability (

**4**). The width of the CI depends on the SEM and the degree of confidence we arbitrarily choose (usually 90%, 95%, or 99%). By using a 95% CI we are 95% confident that the true population mean will fall in the range of values within the interval. If a 95% CI includes a zero, it would indicate that there is a possibility that the mean change of an intervention in a given population is zero. Therefore, the result is not statistically significant.

**Study Limitations**

A description of the limitations of the study design and
methodology allows for transparency as it should describe potential biases.
This includes an explanation of possible errors in internal and external
validity.

**REFERENCES**

1. Akobeng
A. Understanding randomized controlled trials. Arch Dis Child 2005;90:840-844.

2. Carter
R, Lubinsky J, Domholdt E. Rehabilitation Research. 4th ed. 2010. Elsvier; St.
Louis, Missouri.

3. Gaddis
G, Gaddis M. Introduction to biostatistics: part 3, sensitivity, specificity,
predictive value, and hypothesis testing. Ann Emerg Med 1990;19:145-151.

4. Nakagawa
S, Cuthill I. Effect size, confidence interval and statistical significance: a
practical guide for biologists. Biol Rev 2007;82(4):591-605.

5. Physiotherapy
Evidence Database. PEDro Scale (1999). http://www.pedro.org.au/english/downloads/pedro-scale/.
Accessed on January 27, 2015.