Addresses
rigour.addresses
This module provides a set of tools for handling postal/geographic addresses. It includes functions for normalising addresses for comparison purposes, and for formatting addresses given in parts for display as a single string.
Postal address formatting
This set of helpers is designed to help with the processing of real-world addresses, including composing an address from individual parts, and cleaning it up.
from rigour.addresses import format_address_line
address = {
"road": "Bahnhofstr.",
"house_number": "10",
"postcode": "86150",
"city": "Augsburg",
"state": "Bayern",
"country": "Germany",
}
address_text = format_address_line(address, country="DE")
Acknowledgements
The address formatting database contained in rigour/data/addresses/formats.yml
is
derived from worldwide.yml
in the OpenCageData address-formatting
repository. It is used to
format addresses according to customs in the country that is been encoded.
clean_address(full)
format_address(address, country=None)
Format the given address part into a multi-line string that matches the conventions of the country of the given address.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
address
|
Dict[str, Optional[str]]
|
The address parts to be combined. Common parts include: summary: A short description of the address. po_box: The PO box/mailbox number. street: The street or road name. house: The descriptive name of the house. house_number: The number of the house on the street. postal_code: The postal code or ZIP code. city: The city or town name. county: The county or district name. state: The state or province name. state_district: The state or province district name. state_code: The state or province code. country: The name of the country (words, not ISO code). country_code: A pre-normalized country code. |
required |
country
|
Optional[str]
|
ISO code for the country of the address. |
None
|
Returns:
Type | Description |
---|---|
str
|
A single-line string with the formatted address. |
Source code in rigour/addresses/format.py
format_address_line(address, country=None)
Format the given address part into a single-line string that matches the conventions of the country of the given address.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
address
|
Dict[str, Optional[str]]
|
The address parts to be combined. Common parts include: summary: A short description of the address. po_box: The PO box/mailbox number. street: The street or road name. house: The descriptive name of the house. house_number: The number of the house on the street. postal_code: The postal code or ZIP code. city: The city or town name. county: The county or district name. state: The state or province name. state_district: The state or province district name. state_code: The state or province code. country: The name of the country (words, not ISO code). country_code: A pre-normalized country code. |
required |
country
|
Optional[str]
|
ISO code for the country of the address. |
None
|
Returns:
Type | Description |
---|---|
str
|
A single-line string with the formatted address. |
Source code in rigour/addresses/format.py
normalize_address(address, latinize=False, min_length=4)
Normalize the given address string for comparison, in a way that is destructive to the ability for displaying it (makes it ugly).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
address
|
str
|
The address to be normalized. |
required |
latinize
|
bool
|
Whether to convert non-Latin characters to their Latin equivalents. |
False
|
min_length
|
int
|
Minimum length of the normalized address. |
4
|
Returns:
Type | Description |
---|---|
Optional[str]
|
The normalized address. |
Source code in rigour/addresses/normalize.py
remove_address_keywords(address, latinize=False, replacement=WS)
Remove common address keywords (such as "street", "road", "south", etc.) from the
given address string. The address string is assumed to have already been normalized
using normalize_address
.
The output may contain multiple consecutive whitespace characters, which are not collapsed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
address
|
str
|
The address to be cleaned. |
required |
latinize
|
bool
|
Whether to convert non-Latin characters to their Latin equivalents. |
False
|
Returns:
Type | Description |
---|---|
Optional[str]
|
The address, without any stopwords. |
Source code in rigour/addresses/normalize.py
shorten_address_keywords(address, latinize=False)
Shorten common address keywords (such as "street", "road", "south", etc.) in the
given address string. The address string is assumed to have already been normalized
using normalize_address
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
address
|
str
|
The address to be cleaned. |
required |
latinize
|
bool
|
Whether to convert non-Latin characters to their Latin equivalents. |
False
|
Returns:
Type | Description |
---|---|
Optional[str]
|
The address, with keywords shortened. |