How do we build correspondence tables?

Principles

Our guiding principles to deliver the highest quality correspondence tables and save you hours of data preparation time.

Human-made

AI sounds cool but does not apply to one-off, unstructured tasks requiring the highest accuracy. All of our mappings are a result of manual, human work.

Two sets of eyes

Each crosswalk table is mapped by one person and verified by another person. This verification process is oriented at confirming the semantic accuracy of the mapping.

Technical quality assurance

Technical QA eliminates the file formatting issues, as well as help us underline potential issues in the semantic accuracy of mappings.

100% coverage

Each code from the base ("from") industrial classification is available in the crosswalk table.

Step by step process we use to create each correspondence table

  • 1

    Research the "from" and "to" classifications.

    In this step, we learn as much as we can about the classifications to be mapped. This includes the methodology used in their creation, their structure, and similarities to other classifications we have worked with in the past.

  • 2

    Find the most authoritative source of the classification system and get the raw classification files.

    If you ever had to work with industry classifications and were confused with how difficult it is to obtain a full list of codes with their descriptions, you're not alone. Data produced by public institutions is mostly free, but not easy to obtain. We also go through the same hoops to find the lists of codes.

  • 3

    Standardize the raw lists of classification codes with descriptions.

    We transfer the codes and descriptions to our internal tools so that each classification we work with is available in the same format. This allows us to quickly search through the lists of hundreds (or thousands) of codes and find the equivalent descriptions.

  • 4

    Find equivalents of each code in "from" classification in "to" classification.

    This is the "meat and potatoes" of our work. We go, one-by-one, through the list of codes in "from" classification, and find the best equivalents for them in "to" classification. We strive for a one-to-one match, but in some cases, it is not possible. Different classifications were created for different regions, and purposes. Some branches of economic activities are more important in some regions than others, and therefore they are covered in more or less detail in various classifications. For this reason, our crosswalk tables are of one-to-many type. Whenever possible, we include the best matching code. In cases, where the correspondence between description would be too weak, we include additional codes, which results in one-to-many mapping.

  • 5

    Perform the automated checks on the draft of the correspondence table.

    Since the correspondence tables are created with a large dose of "judgment calls" by humans, and humans tend to be biased and make mistakes, we use a set of proprietary tools to automatically check for common errors. This includes both checking for typos, as well as more complex things, like the coverage of codes from "to" classification in the mapping. The data preparation specialist works on the correspondence table until all automated checks pass.

  • 6

    Semantic review by another data preparation specialist

    With all automated checks passing, the correspondence table is handed over to another data preparation specialist for a semantic review. During that review process, the analyst cross-checks at least 15% of codes for semantic accuracy (i.e. checks whether the descriptions of mapped codes do overlap). If the cross-check does not pass the accuracy threshold, the crosswalk table is handed back to the original data preparation specialist for a full revision, after which the cross-check is performed again, until the accuracy threshold is successfully met.

  • 7

    Perform the second set of automated checks on the final draft of the correspondence table.

    In the review & revision process, some errors may occur again. To avoid having them in the final crosswalk table, we once again run the draft through the automated checks.

  • 8

    Create the final output files

    The last step is a final touchup - we use our in-house tools to create the final versions of .csv & .json files for the client.

Get your correspondence tables now

Thousands of correspondence tables instantly available in our store. Don't waste your time on data preparation work, which has already been done.

Go to the store