Skip to content

Working with mtcars

Learn how to create a codebook for the classic mtcars dataset.


Sample Data

For this tutorial, we'll use the mtcars dataset from plotnine:

from plotnine.data import mtcars

print(mtcars.head())

This dataset contains fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).


Creating a Codebook

Basic Usage (Context Manager)

Create a professional codebook with variable descriptions and value labels:

from bookit_df import BookIt

with BookIt(
    "mtcars Codebook",
    output="mtcars_codebook.pdf",
    author="Your Name"
) as book:
    book.from_dataframe(
        mtcars,
        descriptions={
            "mpg": "Miles per gallon",
            "cyl": "Number of cylinders",
            "disp": "Displacement in cubic inches",
            "hp": "Horsepower",
            "drat": "Rear axle ratio",
            "wt": "Weight in 1000 lbs",
            "qsec": "1/4 mile time",
            "vs": "Engine shape (0 = V-shaped, 1 = straight)",
            "am": "Transmission (0 = automatic, 1 = manual)",
            "gear": "Number of forward gears",
            "carb": "Number of carburetor barrels",
        },
        value_labels={
            "vs": {0: "V-shaped", 1: "Straight"},
            "am": {0: "Automatic", 1: "Manual"},
            "gear": {
                3: "3 gears",
                4: "4 gears",
                5: "5 gears",
            },
            "cyl": {
                4: "4 cylinders",
                6: "6 cylinders",
                8: "8 cylinders",
            },
            "carb": {
                1: "1 barrel",
                2: "2 barrels",
                3: "3 barrels",
                4: "4 barrels",
                6: "6 barrels",
                8: "8 barrels",
            },
        },
        suppress_numeric_stats=["am", "vs"],
    )
    book.add_context("mpg", "Fuel economy measure for the 1970s era vehicles.")
# PDF saved automatically on exit!

Output

Here's what the generated codebook looks like:

Can't see the PDF?

If the embedded viewer doesn't work, you can download the PDF directly.


Key Concepts

Categorical Variables

Two variables in mtcars are categorical but stored as numbers:

Variable Values Meaning
vs 0, 1 Engine shape (V-shaped or Straight)
am 0, 1 Transmission type (Automatic or Manual)

Suppressing Numeric Stats

For these categorical variables, we suppress misleading summary statistics:

suppress_numeric_stats=["am", "vs"]

Why suppress?

Computing mean and standard deviation for categorical variables like am (transmission type) would be misleading—the "average" transmission type isn't meaningful.


Next Steps