Skip to content

Addresses

rigour.addresses

Postal/location address handling

This set of helpers is designed to help with the processing of real-world addresses, including composing an address from individual parts, and cleaning it up.

import rigour.addresses as format_one_line

address = {
    "road": "Bahnhofstr.",
    "house_number": "10",
    "postcode": "86150",
    "city": "Augsburg",
    "state": "Bayern",
    "country": "Germany",
}
address_text = format_one_line(address, country="DE")
Acknowledgements

The address formatting database contained in rigour/data/addresses/formats.yml is derived from worldwide.yml in the OpenCageData address-formatting repository. It is used to format addresses according to customs in the country that is been encoded.

clean_address(full)

Remove common formatting errors from addresses.

Source code in rigour/addresses/cleaning.py
def clean_address(full: str) -> str:
    """Remove common formatting errors from addresses."""
    while True:
        full, count = REPL.subn(_sub_match, full)
        if count == 0:
            break
    return full.strip()

format_address(address, country=None)

Format the given address part into a multi-line string that matches the conventions of the country of the given address.

Parameters:

Name Type Description Default
address Dict[str, Optional[str]]

The address parts to be combined. Common parts include: summary: A short description of the address. po_box: The PO box/mailbox number. street: The street or road name. house: The descriptive name of the house. house_number: The number of the house on the street. postal_code: The postal code or ZIP code. city: The city or town name. county: The county or district name. state: The state or province name. state_district: The state or province district name. state_code: The state or province code. country: The name of the country (words, not ISO code). country_code: A pre-normalized country code.

required
country Optional[str]

ISO code for the country of the address.

None

Returns:

Type Description
str

A single-line string with the formatted address.

Source code in rigour/addresses/format.py
def format_address(
    address: Dict[str, Optional[str]], country: Optional[str] = None
) -> str:
    """Format the given address part into a multi-line string that matches the
    conventions of the country of the given address.

    Args:
        address: The address parts to be combined. Common parts include:
            summary: A short description of the address.
            po_box: The PO box/mailbox number.
            street: The street or road name.
            house: The descriptive name of the house.
            house_number: The number of the house on the street.
            postal_code: The postal code or ZIP code.
            city: The city or town name.
            county: The county or district name.
            state: The state or province name.
            state_district: The state or province district name.
            state_code: The state or province code.
            country: The name of the country (words, not ISO code).
            country_code: A pre-normalized country code.
        country: ISO code for the country of the address.

    Returns:
        A single-line string with the formatted address.
    """
    text = _format(address, country=country)
    prev: Optional[str] = None
    while prev != text:
        prev = text
        text = text.replace("\n\n", "\n").replace("\n ", "\n").strip()
    return text

format_address_line(address, country=None)

Format the given address part into a single-line string that matches the conventions of the country of the given address.

Parameters:

Name Type Description Default
address Dict[str, Optional[str]]

The address parts to be combined. Common parts include: summary: A short description of the address. po_box: The PO box/mailbox number. street: The street or road name. house: The descriptive name of the house. house_number: The number of the house on the street. postal_code: The postal code or ZIP code. city: The city or town name. county: The county or district name. state: The state or province name. state_district: The state or province district name. state_code: The state or province code. country: The name of the country (words, not ISO code). country_code: A pre-normalized country code.

required
country Optional[str]

ISO code for the country of the address.

None

Returns:

Type Description
str

A single-line string with the formatted address.

Source code in rigour/addresses/format.py
def format_address_line(
    address: Dict[str, Optional[str]], country: Optional[str] = None
) -> str:
    """Format the given address part into a single-line string that matches the
    conventions of the country of the given address.

    Args:
        address: The address parts to be combined. Common parts include:
            summary: A short description of the address.
            po_box: The PO box/mailbox number.
            street: The street or road name.
            house: The descriptive name of the house.
            house_number: The number of the house on the street.
            postal_code: The postal code or ZIP code.
            city: The city or town name.
            county: The county or district name.
            state: The state or province name.
            state_district: The state or province district name.
            state_code: The state or province code.
            country: The name of the country (words, not ISO code).
            country_code: A pre-normalized country code.
        country: ISO code for the country of the address.

    Returns:
        A single-line string with the formatted address.
    """
    line = ", ".join(_format(address, country=country).split("\n"))
    return clean_address(line)

normalize_address(address, latinize=False, min_length=4, sep=WS)

Normalize the given address string for comparison, in a way that is destructive to the ability for displaying it (makes it ugly).

Parameters:

Name Type Description Default
address str

The address to be normalized.

required
latinize bool

Whether to convert non-Latin characters to their Latin equivalents.

False
min_length int

Minimum length of the normalized address.

4

Returns:

Type Description
Optional[str]

The normalized address.

Source code in rigour/addresses/normalize.py
def normalize_address(
    address: str, latinize: bool = False, min_length: int = 4, sep: str = WS
) -> Optional[str]:
    """Normalize the given address string for comparison, in a way that is destructive to
    the ability for displaying it (makes it ugly).

    Args:
        address: The address to be normalized.
        latinize: Whether to convert non-Latin characters to their Latin equivalents.
        min_length: Minimum length of the normalized address.

    Returns:
        The normalized address.
    """
    norm_address = _normalize_address_text(address, latinize=latinize, sep=WS)
    norm_address = _common_replacer(latinize)(norm_address)
    if sep != WS:
        norm_address = norm_address.replace(WS, sep)
    if len(norm_address) < min_length:
        return None
    return norm_address