A conceptual framework for empirical analysis of migration (part 1: direct empirical measurement)

This post is part 1 of a series outlining a conceptual framework for the empirical analysis of migration. Read the introductory post to the series here. This post focuses on a particular form of comparison that can be carried out through direct empirical measurement. The questions directly answered this way aren’t the ones we are usually most interested in. But at least these are questions for which we can obtain precise answers in principle. That’s a start.

Questions about how different groups of people compare for a given regime at a given point in time (or over an interval of time) can be answered by direct empirical measurement, at least for existing regimes. They cannot be directly answered for hypothetical regimes. But the fact that they can be answered at all differentiates them from other, more speculative, questions.

(Source country, target country) pairs as the basis of aggregation

The conceptual model we use identifies two attributes of a person: the person’s source country (also known as the sending country, and defined as the country that person was born in) and the person’s target country (also known as the receiving country or recipient country, and defined as the country the person now lives in). For non-migrants, the source and target country coincide. For migrants, the source and target country differ. For every individual, therefore, we can write down a (source country, target country) pair. For instance, somebody born in Mexico who stays in Mexico gets the pair (Mexico,Mexico). Somebody born in Nepal who moves to India gets the pair (Nepal,India). (This is obviously a very crude simplified model, because some people migrate temporarily, some migrate to one country and then to another, etc. But it’s good enough to get us started).

We’re interested in the performance on indicator X both for people who stay put in their countries, and for people with particular (source country, target country) combinations. For instance, we may be interested in asking: how does the (Nepal, India) combination fare on indicator X? Explicitly, that’s asking: how do people who are from Nepal and living in India perform on indicator X?

Mathematical digression: using a matrix representation to store the information

We can use a matrix representation where the rows correspond to source countries and the columns correspond to target countries (both rows and columns should be the same list of countries in the same order for the observations below to hold). The entry in a given cell provides information on indicator X about the collection of people whose source country is the row country and whose target country is the column country.

Let’s explicitly consider the case of three countries. Let’s say country 1 is France, country 2 is Germany, and country 3 is the United Kingdom. The indicator X values for these source and target countries can be codified via a matrix:

$latex \begin{pmatrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23} \\ x_{31} & x_{32} & x_{33} \\\end{pmatrix}$

The nine entries are interpreted as follows:

$latex x_{11}$ is the performance on indicator $latex X$ of the people in country 1 (France) who stay in France.
$latex x_{12}$ is the performance on indicator $latex X$ of the people who migrate from country 1 (France) to country 2 (Germany).
$latex x_{13}$ is the performance on indicator $latex X$ of the people who migrate from country 1 (France) to country 3 (the UK).
$latex x_{21}$ is the performance on indicator $latex X$ of the people who migrate from country 2 (Germany) to country 1 (France).
$latex x_{22}$ is the performance on indicator $latex X$ of the people in country 2 (Germany) who stay in Germany.
$latex x_{23}$ is the performance on indicator $latex X$ of the people who migrate from country 2 (Germany) to country 3 (the UK).
$latex x_{31}$ is the performance on indicator $latex X$ of the people who migrate from country 3 (the UK) to country 1 (France).
$latex x_{32}$ is the performance on indicator $latex X$ of the people who migrate from country 3 (the UK) to country 2 (Germany).
$latex x_{33}$ is the performance on indicator $latex X$ of the people in country 3 (the UK) who stay in the UK.

Note that the entries on the main diagonal (the one from top left to bottom left), namely $latex x_{11}$, $latex x_{22}$, and $latex x_{33}$, correspond to the non-migrants, i.e., the people who stay put in their country. The off-diagonal entries, i.e., the entries $latex x_{ij}, i \ne j$, correspond to migrants. In this case, there are six such entries: $latex x_{12}, x_{13}, x_{21}, x_{23}, x_{31}, x_{32}$.

The three countries in the example above weren’t ordered in any particular way, so there is no significance of an entry being above or below the diagonal. If the countries had been ordered based on some criterion (such as GDP (PPP) per capita), then the entries above and below the diagonal would reflect different types of migration based on whether the sending or receiving country had higher GDP (PPP) per capita.

The simplified example here considers migration between three countries. However, if we want to study migration worldwide, we’d need to include all countries. If there are 200 countries, then we’d have a $latex 200 \times 200$ matrix, with a total of 40,000 entries. In general, if there are $latex n$ countries, the matrix is a $latex n \times n$ matrix with a total of $latex n^2$ entries, of which there are $latex n$ diagonal entries (corresponding to the people who stay put in their respective countries) and $latex n^2 – n = n(n-1)$ off-diagonal entries (corresponding to people who migrate from one country to another). Half of them ($latex n(n – 1)/2$) are above the diagonal. and the other half are below the diagonal, but the above/below distinction is of importance only if the countries are ordered according to some criterion.

Now, there may be cases where migration between two countries is so quantitatively small, or even actually zero, that it’s not meaningful to compute that particular matrix entry. For instance, I think there is zero migration from North Korea to Somalia. So, some entries of the matrix are not defined. This means that we need to be careful if we intend to subject the matrix to techniques of linear algebra. However, we’re using the matrix only to store information, and we don’t perform matrix operations.

End mathematical digression

Totals versus averages

In some cases, we care about the per capita level of an indicator. This is usually the case for indicators such as GDP per capita, crime, or unemployment. In cases where fixed resources are being used up, however, we may care more about the total use. An example may be water use in a country that has a fairly limited water supply. If we’re concerned about total use, then in addition to knowing the per capita value on indicator X for (source country, target country) pairs, we also need to know the size of the population.

The relative size of different populations may matter even if we are concerned only about averages, because we need relative sizes to compute weighted averages.

Weighted averages for residents, naturals, immigrants, and emigrants

In some cases, we are interested not in a particular (source country, target country) combination, but in combining information for all people in a particular source or target country. The following are four typical weighted averages we are interested in. If we are looking at a total of $latex n$ countries, then there are $latex n$ weighted averages of each type (one for each country) and therefore a total of $latex 4n$ weighted averages to consider.

The weighted average for all residents of a country, including natives of the country who stay put and migrants from other countries to that country.
The weighted average for all naturals of a country, including natives of that country who stay put and people from that country who migrate to other countries.
The weighted average for all immigrants to a country, i.e., people who have that as their target country but are from other source countries.
The weighted average for all emigrants from a country, i.e., people who have that as their source country but now live in other countries.

Typical forms of comparison

After figuring out how various (source country, target country) combinations, or weighted averages thereof, fare, we can then ask how they compare with one another. Here are some typical questions that can be asked. We’ll use the letter A to denote a hypothetical source country and the letter B to denote a hypothetical target country, but you can replace these with concrete instances (such as France and the United Kingdom):

How do migrants from country A to country B compare with natives of country B (the target country) on indicator X?
How do migrants from country A to country B compare with natives of country A (the source country) on indicator X?
How do migrants to country B compare with resident natives of that country on X?
How do migrants from country A compare with resident natives of that country on X?
How do migrants from country A compare with the natives of the countries they go to on X (combined analysis for all countries they go to)?
How do migrants to country B compare with the natives of their source countries on X (combined analysis for all source countries)?
How do migrants in general compare with non-migrants in general on X?
How do natives of a country receiving migrants compare with natives of a country sending migrants on X? One advantage of this question is that it can be asked without collecting separate statistics on migrants, and can also be asked prior to implementation of migration policies, although the answer might change after implementation of the migration policies.

Mathematical digression: interpretation of the questions in matrix terms

Here is how each of the questions would look like in terms of the matrix representation. For illustrative purposes, we will continue to draw on the three-country setup above with country 1 as France, country 2 as Germany, and country 3 as the United Kingdom.

Compare a matrix entry with the diagonal entry in its column. If we’re interested in studying migration from the UK to France, we compare the entry $latex x_{31}$ (migrants from the UK to France) with the entries $latex x_{11}$ (French natives who stay put).
Compare a matrix entry with the diagonal entry in its row. If we’re interested in studying migration from the UK to France, we compare the entry $latex x_{31}$ (migrants from the UK to France) with the entries $latex x_{33}$ (UK natives who stay put).
Compare the (weighted) average of the off-diagonal entries in a column with the diagonal entry of that column. If we are interested in understanding migration to Germany, we need to compare the entries $latex x_{12}$ and $latex x_{32}$ (migrants from France and the UK to Germany) with $latex x_{22}$ (Germans who stay put). We would usually compute the average of $latex x_{12}$ and $latex x_{32}$ weighted by the respective population sizes.
Compare the (weighted) average of the off-diagonal entries in a row with the diagonal entry of that row. If we are interested in understanding migration from France, we need to compare the entries $latex x_{12}$ and $latex x_{13}$ (migrants from France to Germany and to the UK) with the entry $latex x_{11}$ (French who stay put).
A bunch of pairwise comparisons of the type seen in Question 1 (with pairs in the same column). If we’re interested in figuring out how migrants from France compare with the natives wherever they go. Then, we will compare $latex x_{12}$ with $latex x_{22}$ (comparing French migrants to Germany with Germans who stay put), and separately compare $latex x_{13}$ with $latex x_{33}$ (comparing French migrants to the UK with UK natives who stay put).
A bunch of pairwise comparisons of the type seen in Question 2 (with pairs in the same row). If we’re interested in figuring out how migrants to the UK fare relative to the natives of their source country. Then, we will compare $latex x_{13}$ with $latex x_{11}$ (French who move to the UK versus French who stay put), and separately compare $latex x_{23}$ with $latex x_{22}$ (Germans who move to the UK versus Germans who stay put).
The off-diagonal entries represent migrants, and the diagonal entries represents people who do not migrate. This question therefore involves a comparison of the off-diagonal entries and the diagonal entries.
This compares two diagonal entries. If we’re interested in comparing Germany and the UK, we’ll compare $latex x_{22}$ and $latex x_{33}$.

End mathematical digression

Remarks on selection and treatment effects

We’ll return to this in more depth in part 2, but here are a few preliminary remarks.

The significance of the migration policy regime and other aspects of the scenario (economic policies, economic performance, linguistic differences, etc.) on the indicator matrix is two-fold:

A compositional selection effect (for short, we’ll call this a selection effect or a compositional effect) for the groupings, i.e., the choice of the migration policy scenario determines who migrates and who doesn’t, and therefore affects what set of people get included in various (source country, target country) pairs.
A treatment effect for the groupings, i.e., some people being able to migrate affects their own performance on indicator X, and also affect the performance on the indicator of others who stay behind in their own countries.

In Part 2, we will look more closely at how to isolate selection and treatment effects when comparing different policy regimes.

Remarks on measurability

For existing policy regimes, the performance on particular indicators of particular (source country, target country) pairs can be computed in principle. Some methods involve complete measurement: for instance, census data that asks people to identify their country of origin, or computerized records of all residents along with their source country. Other methods involve the use of partial data along with sampling techniques to extrapolate to the general population.

Some challenges:

In some cases, there is ambiguity, both conceptual and empirical, on the source country of individuals, or on what it means to be a resident (for instance, do we count crimes by tourists?)
In some cases, people deliberately conceal or misrepresent information about themselves where the stakes are high. For instance, a foreign-born person may claim to be a native-born when arrested for a misdemeanor, in order to avoid deportation. On the other hand, those who prefer deportation to another country to spending time in prison may misrepresent themselves as foreign-born. People may lie to get access to welfare benefits. False identity documentation may be produced in order to be eligible to work.
In some cases, the population involved is so small that the indicator cannot be measured from small samples of the overall population. For instance, there are about 100 people in the US who were born in North Korea. A random sample would probably not pick any of them. Even if it did, statistical averages for the population would not be robust.
There are challenges when considering the comparability of indicators across different target countries (and in some cases even within a particular country), because different countries (and different jurisdictions within a country) use different protocols for measurement and have different sources of bias. For instance, the rate of crime reporting may differ considerably between countries, particularly for rape and minor theft. Similarly, when comparing income values, purchasing power parity estimates are not necessarily reliable.

Normative significance of comparisons

The measurements and comparisons here offer only a starting point for investigating the effects of migration: we’d need comparative statics between different regimes in order to tease out the effects of migration. We’ll talk about this more in part 2 and in part 3. But in many cases, our only reliable empirical measurements are the direct ones discussed here, and people often draw conclusions based on this evidence. The following are three typical styles of crude conclusion people draw.

Immigrants to country B do better (respectively, worse) on the indicator than natives of country B who stay in their country $latex \implies$ immigration “good” (respectively, “bad”) for country B.
Emigrants from country A do better (respectively, worse) on the indicator than natives of country A $latex \implies$ emigration “bad” (cf. brain drain) (respectively, “good”) for country A.
Natives of country A worse on the indicator than natives of country B $latex \implies$ Migration from country A to country B good for country A and bad for country B.

Of course, put so bluntly, the claims seem obviously ill-substantiated, and they often break down in practice.

But apart from the need to do more sophisticated counterfactual analysis to actually talk about the effects of migration, there’s another important point: the overall levels of an indicator might matter more than how different groups compare on it. The relative crime rates of natives and migrants are not as important as knowing whether either group has a high crime rate. The relative fertility rates are similarly less important than the overall fertility level. Too much focus on the question of “are immigrants better than natives?” can lead us to ignore other questions of greater moral and practical relevance.