Detect location from given text using NLP
It is possible to infer the location (country) from a given text using natural language processing (NLP) techniques. One approach is to use named entity recognition (NER) to identify location names in the text, and then use a geocoding service to map those locations to countries.
Here’s an example code in Python that uses the SpaCy library for NLP and the GeoPy library for geocoding:
!pip install spacy geopy
import spacy
from geopy.geocoders import Nominatim
# Load the SpaCy NLP model
nlp = spacy.load('en_core_web_sm')
# Initialize the GeoPy geocoder
geolocator = Nominatim(user_agent='my_app')
# Define a function to extract location names from a text using SpaCy NER
def extract_locations(text):
doc = nlp(text)
return [ent.text for ent in doc.ents if ent.label_ == 'LOC']
# Define a function to map location names to countries using GeoPy geocoding
def map_locations_to_countries(locations):
countries = set()
for location in locations:
try:
location = geolocator.geocode(location, addressdetails=True, exactly_one=True)
country = location.raw['address']['country']
countries.add(country)
except:
pass
return countries
# Example text to infer country from
text = "I am traveling to Paris next month."
# Extract location names from the text
locations = extract_locations(text)
# Map location names to countries
countries = map_locations_to_countries(locations)
# Print the inferred countries
print(countries)
In this example, we first load the SpaCy NLP model and initialize the GeoPy geocoder. We then define a function extract_locations
to extract location names from a text using SpaCy NER, and a function map_locations_to_countries
to map those locations to countries using GeoPy geocoding. Finally, we apply these functions to an example text “I am traveling to Paris next month” and print the inferred countries, which in this case should be France.
Note that this approach is not perfect and may not work for all texts, especially if they do not contain explicit location names or if the location names are ambiguous. Additionally, the geocoding service may have limitations or inaccuracies in mapping locations to countries. Therefore, it’s important to use this approach as part of a broader analysis and to validate the results with other sources of information.
As of May 11 2025 it does not work out of the box!
Perhaps, up to date en_core_web_sm model is re-trained so that Paris is no longer LOC but of GPE (GeoPolitical) entity.
But in my case, trying to find a way to detect adjectivals and demonyms to be also recognized, like French, Spanish, German.
Under spacy they fall into NORP (Nationalities or religious or political groups).
These are the test results:
================== Analyzing Text 1 ==================
Input Text: “What about España or the UK? Let’s also consider Tokyo.”
— All Entities Found by SpaCy —
Text: ‘España’, Label: PERSON, Start: 11, End: 17
Text: ‘UK’, Label: GPE, Start: 25, End: 27
Text: ‘Tokyo’, Label: GPE, Start: 49, End: 54
— GPE/LOC Entities Sent for Geocoding —
[‘UK’, ‘Tokyo’]
Attempting to geocode: [‘UK’, ‘Tokyo’]
Geocoded ‘UK’ -> Country: ‘Central African Republic’ (Full address: Ouaka, Central African Republic)
Geocoded ‘Tokyo’ -> Country: ‘Japan’ (Full address: Tokyo, Japan)
— Inferred Countries (from GPE/LOC entities) —
[‘Central African Republic’, ‘Japan’]
=====================================================
================== Analyzing Text 2 ==================
Input Text: “The document mentions french wine and Italian pasta, with a focus on a German car manufacturer.”
— All Entities Found by SpaCy —
Text: ‘french’, Label: NORP, Start: 22, End: 28
Text: ‘Italian’, Label: NORP, Start: 38, End: 45
Text: ‘German’, Label: NORP, Start: 71, End: 77
— GPE/LOC Entities Sent for Geocoding —
[‘french’, ‘Italian’, ‘German’]
Attempting to geocode: [‘french’, ‘Italian’, ‘German’]
Geocoded ‘french’ -> Country: ‘Argentina’ (Full address: French, Cuartel Doce de Octubre, Partido de Nueve de Julio, Buenos Aires, 6516, Argentina)
Geocoded ‘Italian’ -> Country: ‘Ukraine’ (Full address: Italian, Piatykhatky, Київський район, Kharkiv, Kharkiv Urban Hromada, Kharkiv Raion, Kharkiv Oblast, Ukraine)
Geocoded ‘German’ -> Country: ‘Germany’ (Full address: Germany)
— Inferred Countries (from GPE/LOC entities) —
[‘Argentina’, ‘Germany’, ‘Ukraine’]
=====================================================
Summary:
– “España” was detected by spacy as PERSON label so igrnored
– “UK” was detected by geopy as Central African Republic
– “french” was detected as Argentina
– “Italian” as Ukraine
– only “German” was properly decoded correctly as Germany
– plus “Tokyo” as Japan
Very srtange that so far there is no a ready made out of the box solution for this simple task. And the approach I see so far is taking Wikipedia’s table https://en.wikipedia.org/wiki/List_of_adjectival_and_demonymic_forms_for_countries_and_nations of adjectivals and demonyms and writing custom mappings from it.