Geospatial Analytics using Python and SQL

A Crash Course in PostGIS

This adapted from Purdue’s HONR 39900-054: Foundations in Geospatial Analytics. For more information and examples, please head to the course’s GitHub site: github.com/gouldju1/honr39900-foundations-of-geospatial-analytics.

Required Packages

*In[1]:*

import geopandas as gpd
from sqlalchemy import create_engine, text
from sqlalchemy_utils import create_database, database_exists, drop_database
import pandas as pd
import sqlalchemy

Load Data #1

Today, we will be working with the boroughs data from Chapter 11 of our textbook.

*In[2]:*

nyc = gpd.read_file("./nyc_boroughs/geo_export_b38c64b9-7e50-43c7-8880-994b22440dbf.shp")

*In[3]:*

nyc.head()

*Out[3]:*

[cols=",,,,,",options="header",]
|===
| |boro_code |boro_name |shape_area |shape_leng |geometry
|0 |1.0 |Manhattan |9.442947e+08 |203803.216852 |MULTIPOLYGON
(((-74.04388 40.69019, -74.04351 ...

|1 |2.0 |Bronx |1.598380e+09 |188054.198841 |POLYGON ((-73.86477
40.90201, -73.86305 40.901...

|2 |3.0 |Brooklyn |2.684411e+09 |234928.658563 |POLYGON ((-73.92722
40.72533, -73.92654 40.724...

|3 |4.0 |Queens |3.858050e+09 |429586.630985 |POLYGON ((-73.77896
40.81171, -73.76371 40.793...

|4 |5.0 |Staten Island |2.539686e+09 |212213.139971 |POLYGON ((-74.05581
40.64971, -74.05619 40.639...
|===

*In[4]:*

nyc.plot()

Connect to Postgres

*In[5]:*

#Variables
db_type = "postgres"
username = "postgres"
password = ""
host = "localhost"
port = "5432"
db_name = "demo"

#Put it together
engine = create_engine(f"{db_type}://{username}:{password}@{host}:{port}/{db_name}")

*In[21]:*

#Write nyc to PostgreSQL
nyc.to_postgis(name="boroughs_chapter_11", con=engine)

*In[6]:*

engine.table_names() #We see that our table was added to our database

*Out[6]:*

['spatial_ref_sys',
 'london',
 'airbnb_homework_4',
 'streets_chapter_11',
 'boroughs_chapter_11']

*In[7]:*

#List Schemas
insp = sqlalchemy.inspect(engine)
insp.get_schema_names()

*Out[7]:*

Review: Accessing our Postgres Data

Whenever we use Jupyter Notebooks to access and use our data from postgres, you can use the beginning of this notebook as a standard boilerplate. Be sure to take the notebook with you from class! :)

*In[8]:*

#SQL query
sql = "SELECT * FROM public.boroughs_chapter_11"

#Specify name of column which stores our geometry! In table streets_chapter_11, the geometry is stored in a col called geometry geom_col = "geometry"

#Execute query to create GeoDataFrame nyc_from_db = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col=geom_col)

*In[9]:*

nyc_from_db.head() #Yay!

*Out[9]:*

	boro_code	boro_name	shape_area	shape_leng	geometry
0	1.0	Manhattan	9.442947e+08	203803.216852	MULTIPOLYGON (((-74.04388 40.69019, -74.04351 …
1	2.0	Bronx	1.598380e+09	188054.198841	POLYGON ((-73.86477 40.90201, -73.86305 40.901…
2	3.0	Brooklyn	2.684411e+09	234928.658563	POLYGON ((-73.92722 40.72533, -73.92654 40.724…
3	4.0	Queens	3.858050e+09	429586.630985	POLYGON ((-73.77896 40.81171, -73.76371 40.793…
4	5.0	Staten Island	2.539686e+09	212213.139971	POLYGON ((-74.05581 40.64971, -74.05619 40.639…

Agenda

We have a LOT to cover today…here’s a snapshot:

Topic Name	Textbook Chapter/Section
Creating a single `MULTIPOLYGON` from many	11.1.1
Creating a `LINESTRING` from `POINT`s	11.1.2
Clipping	11.2.1
Splitting	11.2.2
Tessellating	11.2.3
Sharding	11.2.3
Segmentizing `LINESTRING`s	11.3.1
Scaling	11.4.2
Rotating	11.4.3

Creating a single `MULTIPOLYGON` from many

Textbook Chapter/Section: 11.1.1
Textbook Start Page: 310

In many cases, you may have a city where records are broken out by districts, neighborhoods, boroughs, or precincts because you often need to view or report on each neighborhood separately. Sometimes, however, for reporting purposes you need to view the city as a single unit. In this case you can use the ST_UNION aggregate function to amass one single multipolygon from constituent multipolygons.

For example, the largest city in the United States is New York, made up of the five storied boroughs of Manhattan, Bronx, Queens, Brooklyn, and Staten Island. To aggregate New York, you first need to create a boroughs table with five records—one multipolygon for each of the boroughs with littorals:

Then you can use the ST_Union spatial aggregate function to group all the boroughs into a single city, as follows:

*In[10]:*

sql = """
SELECT
    ST_Union(geometry) AS city
FROM
    public.boroughs_chapter_11;
"""

example_11_1_1 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="city") #Note the change in geom_col!

*In[11]:*

example_11_1_1.head()

*Out[11]:*

	city
0	MULTIPOLYGON (((-74.03844 40.55674, -74.04955 …

*In[12]:*

example_11_1_1.plot()

Let’s work through an example in the San Francisco area using our table of cities. This example lists cities that straddle multiple records, how many polygons each city straddles, and how many polygons you’ll be left with after dissolving boundaries within each city.

*In[13]:*

sql = """
SELECT
    city,
    COUNT(city) AS num_records,
    SUM(ST_NumGeometries(geom)) AS numpoly_before,
    ST_NumGeometries(ST_Multi(ST_Union(geom))) AS num_poly_after,
    ST_PointFromText('POINT(0 0)') AS dummy
FROM
    ch11.cities
GROUP BY
    city, dummy
HAVING
    COUNT(city) > 1;
"""

ex_11_1_1_2 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="dummy") #Note the change in geom_col!

*In[14]:*

ex_11_1_1_2

*Out[14]:*

	city	num_records	numpoly_before	num_poly_after	dummy
0	ALAMEDA	4	4	4	POINT (0.00000 0.00000)
1	BELVEDERE TIBURON	2	2	2	POINT (0.00000 0.00000)
2	BRISBANE	2	2	1	POINT (0.00000 0.00000)
3	GREENBRAE	2	2	2	POINT (0.00000 0.00000)
4	LARKSPUR	2	2	2	POINT (0.00000 0.00000)
5	REDWOOD CITY	2	2	2	POINT (0.00000 0.00000)
6	SAN FRANCISCO	7	7	6	POINT (0.00000 0.00000)
7	SAN MATEO	2	2	2	POINT (0.00000 0.00000)
8	SOUTH SAN FRANCISCO	2	2	2	POINT (0.00000 0.00000)
9	SUISUN CITY	2	2	2	POINT (0.00000 0.00000)

From the code above, you know that ten cities have multiple records, but you’ll only be able to dissolve the boundaries of Brisbane and San Francisco, because only these two have fewer polygons per geometry than what you started out with.

Below, you will aggregate and insert the aggregated records into a table called ch11.distinct_cities. You then add a primary key to each city to ensure that you have exactly one record per city…

*In[15]:*

#New Table
sql = """SELECT city, ST_Multi(
    ST_Union(geom))::geometry(multipolygon,2227) AS geom
FROM ch11.cities
GROUP BY city, ST_SRID(geom);
"""
new_ex_1 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom") #Note the change in geom_col!

*In[16]:*

new_ex_1.plot()

*In[18]:*

#Add to db
new_ex_1.to_postgis(name="distinct_cities", schema="ch11", con=engine)

*In[19]:*

#New Table
sql = """
SELECT
    *
FROM
    ch11.distinct_cities
"""
new_ex_1_1 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom") #Note the change in geom_col!

*In[20]:*

new_ex_1_1.head()

*Out[20]:*

	city	geom
0	ALAMEDA	MULTIPOLYGON (((6062440.660 2099431.510, 60623…
1	ALAMO	MULTIPOLYGON (((6142769.437 2132420.993, 61418…
2	ALBANY	MULTIPOLYGON (((6045760.680 2154419.700, 60461…
3	ALVISO	MULTIPOLYGON (((6133654.370 1993195.000, 61330…
4	AMERICAN CANYON	MULTIPOLYGON (((6072029.630 2267404.510, 60811…

city

geom

ALAMEDA

MULTIPOLYGON (((6062440.660 2099431.510, 60623…

ALAMO

MULTIPOLYGON (((6142769.437 2132420.993, 61418…

ALBANY

MULTIPOLYGON (((6045760.680 2154419.700, 60461…

ALVISO

MULTIPOLYGON (((6133654.370 1993195.000, 61330…

AMERICAN CANYON

MULTIPOLYGON (((6072029.630 2267404.510, 60811…

*In[21]:*

new_ex_1_1.plot()

In the above code, you create and populate a new table called ch11.distinct_cities. You use the ST_Multi function to ensure that all the resulting geometries will be multipolygons and not polygons. If a geometry has a single polygon, ST_Multi will upgrade it to be a multipolygon with a single polygon. Then you cast the geometry using typmod to ensure that the geometry type and spatial reference system are correctly registered in the geometry_columns view. For good measure we also put in a primary key and a spatial index.

Creating a `LINESTRING` from `POINT`s

Textbook Chapter/Section: 11.1.2
Textbook Start Page: 312

In the past two decades, the use of GPS devices has gone mainstream. GPS Samaritans spend their leisure time visiting points of interest (POI), taking GPS readings, and sharing their adventures via the web. Common venues have included local taverns, eateries, fishing holes, and filling stations with the lowest prices. A common follow-up task after gathering the raw positions of the POIs is to connect them to form an unbroken course.

In this exercise, you’ll use Australian track points to create linestrings. These track points consist of GPS readings taken during a span of about ten hours from afternoon to early morning on a wintry July day. We have no idea of what the readings represent. Let’s say a zoologist fastened a GPS around the neck of a roo and tracked her for an evening. The readings came in every ten seconds or so, but instead of creating one line-string with more than two thousand points, you’ll divide the readings into 15-minute intervals and create separate linestrings for each of the intervals.

ST_MakeLine is the spatial aggregate function that takes a set of points and forms a linestring out of them. You can add an ORDER BY clause to aggregate functions; this is particularly useful when you need to control the order in which aggregation occurs. In this example, you’ll order by the input time of the readings.

*In[22]:*

sql = """
SELECT
    DATE_TRUNC('minute',time) -
CAST( mod(
                CAST(DATE_PART('minute',time) AS integer),15
            ) ||' minutes' AS interval
        ) AS track_period,
    MIN(time) AS t_start,
    MAX(time) AS t_end,
    ST_MakeLine(geom ORDER BY time) AS geom
INTO ch11.aussie_run
FROM ch11.aussie_track_points
GROUP BY track_period
HAVING COUNT(time) > 1;
SELECT
    CAST(track_period AS timestamp),
    CAST(t_start AS timestamp) AS t_start,
    CAST(t_end AS timestamp) AS t_end,
    ST_NPoints(geom) AS number_of_points,
    CAST(ST_Length(geom::geography) AS integer) AS dist_m,
    (t_end - t_start) AS dur,
    geom
FROM ch11.aussie_run;
"""

example_11_1_2 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom")

*In[23]:*

example_11_1_2.head()

*Out[23]:*

	track_period	t_start	t_end	number_of_points	dist_m	dur	geom
0	2009-07-18 04:30:00	2009-07-18 04:30:00	2009-07-18 04:44:59	133	2705	0 days 00:14:59	LINESTRING (152.95709 -27.16209, 152.95704 -27…
1	2009-07-18 04:45:00	2009-07-18 04:45:05	2009-07-18 04:55:20	87	1720	0 days 00:10:15	LINESTRING (152.94813 -27.17278, 152.94828 -27…
2	2009-07-18 05:00:00	2009-07-18 05:02:00	2009-07-18 05:14:59	100	1530	0 days 00:12:59	LINESTRING (152.94259 -27.17509, 152.94265 -27…
3	2009-07-18 05:15:00	2009-07-18 05:15:05	2009-07-18 05:29:59	149	3453	0 days 00:14:54	LINESTRING (152.94276 -27.17723, 152.94263 -27…
4	2009-07-18 05:30:00	2009-07-18 05:30:05	2009-07-18 05:43:58	132	3059	0 days 00:13:53	LINESTRING (152.94757 -27.18276, 152.94754 -27…

track_period

t_start

t_end

number_of_points

dist_m

dur

geom

2009-07-18 04:30:00

2009-07-18 04:44:59

133

2705

0 days 00:14:59

LINESTRING (152.95709 -27.16209, 152.95704 -27…

2009-07-18 04:45:00

2009-07-18 04:45:05

2009-07-18 04:55:20

1720

0 days 00:10:15

LINESTRING (152.94813 -27.17278, 152.94828 -27…

2009-07-18 05:00:00

2009-07-18 05:02:00

2009-07-18 05:14:59

100

1530

0 days 00:12:59

LINESTRING (152.94259 -27.17509, 152.94265 -27…

2009-07-18 05:15:00

2009-07-18 05:15:05

2009-07-18 05:29:59

149

3453

0 days 00:14:54

LINESTRING (152.94276 -27.17723, 152.94263 -27…

2009-07-18 05:30:00

2009-07-18 05:30:05

2009-07-18 05:43:58

132

3059

0 days 00:13:53

LINESTRING (152.94757 -27.18276, 152.94754 -27…

*In[24]:*

example_11_1_2.plot()

First you create a column called track_period specifying quarter-hour slots starting on the hour, 15 minutes past, 30 minutes past, and 45 minutes past. You allocate each GPS point into the slots and create separate linestrings from each time slot via the GROUP BY clause. Not all time slots need to have points, and some slots may have a single point. If a slot is devoid of points, it won’t be part of the output. If a slot only has one point, it’s removed. For the allocation, you use the data_part function and the modulo operator.

Within each slot, you create a linestring using ST_MakeLine. You want the line to follow the timing of the measurements, so you add an ORDER BY clause to ST_MakeLine.

The SELECT inserts directly into a new table called aussie_run. (If you’re not running this code for the first time, you’ll need to drop the aussie_run table first.) Finally, you query aussie_run to find the number of points in each linestring using ST_NPoints, subtracting the time of the last point from the time of the first point to get a duration, and using ST_Length to compute the distance covered between the first and last points within the 15-minute slot. Note that you cast the geometry in longitude and latitude to geography to ensure you have a measurable unit—meters.

Clipping

Textbook Chapter/Section: 11.2.1
Textbook Start Page: 314

Clipping uses one geometry to cut another during intersection.Clipping uses one geometry to cut another during intersection. In this section, we’ll explore other functions available to you for clipping and splitting.In this section, we’ll explore other functions available to you for clipping and splitting.

As the name implies, clipping is the act of removing unwanted sections of a geometry, leaving behind only what’s of interest. Think of clipping coupons from a newspaper, clipping hair from someone’s head, or the moon clipping the sun in a solar eclipse.

Difference and symmetric difference are operations closely related to intersection. They both serve to return the remainder of an intersection. ST_Difference is a non—commutative function, whereas ST_SymDifference is, as the name implies, commutative.As the name implies, clipping is the act of removing unwanted sections of a geometry, leaving behind only what’s of interest. Think of clipping coupons from a newspaper, clipping hair from someone’s head, or the moon clipping the sun in a solar eclipse.

Difference and symmetric difference are operations closely related to intersection. They both serve to return the remainder of an intersection. ST_Difference is a non—commutative function, whereas ST_SymDifference is, as the name implies, commutative.

Difference functions return the geometry of what’s left out when two geometries intersect. When given geometries A and B, ST_Difference(A,B) returns the portion of A that’s not shared with B, whereas ST_SymDifference(A,B) returns the portion of A and B that’s not shared. Difference functions return the geometry of what’s left out when two geometries intersect. When given geometries A and B, ST_Difference(A,B) returns the portion of A that’s not shared with B, whereas ST_SymDifference(A,B) returns the portion of A and B that’s not shared.

ST_SymDifference(A,B) =  Union(A,B) - Intersection(A,B)
ST_Difference(A,B) = A - Intersection(A,B)

Example: What’s left of the polygon and line after clipping

Here, you’re getting the difference between a linestring and polygon

*In[25]:*

#This is what our polygon looks like
sql = """
SELECT ST_GeomFromText(
        'POLYGON((
            2 4.5,3 2.6,3 1.8,2 0,-1.5 2.2,
            0.056 3.222,-1.5 4.2,2 6.5,2 4.5
        ))'
    ) AS geom1
"""

poly = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom1") #Note change of geom_col! poly.plot()

*In[26]:*

#This is what our polygon looks like
sql = """
SELECT ST_GeomFromText('LINESTRING(-0.62 5.84,-0.8 0.59)') AS geom2
"""

LS = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom2") #Note change of geom_col! LS.plot()

*In[27]:*

#The difference between the polygon and linestring is a polygon
sql = """
SELECT
    ST_Intersects(g1.geom1,g1.geom2) AS they_intersect,
    GeometryType(
        ST_Difference(g1.geom1,g1.geom2) ) AS intersect_geom_type,
    ST_Difference(g1.geom1,g1.geom2) AS intersect
FROM (
    SELECT ST_GeomFromText(
        'POLYGON((
            2 4.5,3 2.6,3 1.8,2 0,-1.5 2.2,
            0.056 3.222,-1.5 4.2,2 6.5,2 4.5
        ))'
    ) AS geom1,
    ST_GeomFromText('LINESTRING(-0.62 5.84,-0.8 0.59)') AS geom2
) AS g1;
"""

example_11_2_1_1 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="intersect") #Note change of geom_col! example_11_2_1_1.head()

*Out[27]:*

	they_intersect	intersect_geom_type	intersect
0	True	POLYGON	POLYGON ((2.00000 4.50000, 3.00000 2.60000, 3….

they_intersect

intersect_geom_type

intersect

True

POLYGON

POLYGON ((2.00000 4.50000, 3.00000 2.60000, 3….

*In[28]:*

example_11_2_1_1.plot()

*In[29]:*

#The difference between the linestring and polygon is a multilinestring
sql = """
SELECT
    ST_Intersects(g1.geom1,g1.geom2) AS they_intersect,
    GeometryType(
        ST_Difference(g1.geom2,g1.geom1) ) AS intersect_geom_type,
    ST_Difference(g1.geom2,g1.geom1) AS intersect
FROM (
    SELECT ST_GeomFromText(
        'POLYGON((
            2 4.5,3 2.6,3 1.8,2 0,-1.5 2.2,
            0.056 3.222,-1.5 4.2,2 6.5,2 4.5
        ))'
    ) AS geom1,
    ST_GeomFromText('LINESTRING(-0.62 5.84,-0.8 0.59)') AS geom2) AS g1;
"""

example_11_2_1_2 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="intersect") #Note change of geom_col! example_11_2_1_2.head()

*Out[29]:*

	they_intersect	intersect_geom_type	intersect
0	True	MULTILINESTRING	MULTILINESTRING ((-0.62000 5.84000, -0.65724 4…

they_intersect

intersect_geom_type

intersect

True

MULTILINESTRING

MULTILINESTRING ((-0.62000 5.84000, -0.65724 4…

*In[30]:*

example_11_2_1_2.plot()

*In[31]:*

#The symmetric difference is a geometry collection
sql = """
SELECT
    ST_Intersects(g1.geom1,g1.geom2) AS they_intersect,
    GeometryType(
        ST_SymDifference(g1.geom1,g1.geom2)
    ) AS intersect_geom_type,
    ST_SymDifference(g1.geom2,g1.geom1) AS intersect
FROM (
    SELECT ST_GeomFromText(
        'POLYGON((
            2 4.5,3 2.6,3 1.8,2 0,-1.5 2.2,
            0.056 3.222,-1.5 4.2,2 6.5,2 4.5
        ))'
    ) AS geom1,
    ST_GeomFromText('LINESTRING(-0.62 5.84,-0.8 0.59)') AS geom2) AS g1;
"""
example_11_2_1_3 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="intersect") #Note change of geom_col!
example_11_2_1_3.head()

*Out[31]:*

	they_intersect	intersect_geom_type	intersect
0	True	GEOMETRYCOLLECTION	GEOMETRYCOLLECTION (LINESTRING (-0.62000 5.840…

they_intersect

intersect_geom_type

intersect

True

GEOMETRYCOLLECTION

GEOMETRYCOLLECTION (LINESTRING (-0.62000 5.840…

*In[32]:*

example_11_2_1_3.plot()

In the preceding listing, the first SELECT returns a polygon #1, which is pretty much the same polygon you started out with. The second SELECT returns a multilinestring composed of three linestring`s where the `polygon cuts through #2. Finally, the third SELECT returns a geometrycollection as expected, composed of a multilinestring and a polygon #3.

Splitting

Textbook Chapter/Section: 11.2.2
Textbook Start Page: 316

You just learned that using a linestring to slice a polygon with ST_Difference doesn’t work. For that, PostGIS offers another function called ST_Split. The ST_Split function can only be used with single geometries, not collections, and the blade you use to cut has to be one dimension lower than what you’re cutting up.

The following listing demonstrates the use of ST_Split.

*In[35]:*

sql = """SELECT gd.path[1] AS index, gd.geom AS geom
FROM (
    SELECT
        ST_GeomFromText(
            'POLYGON((
                2 4.5,3 2.6,3 1.8,2 0,-1.5 2.2,0.056
            3.222,-1.5 4.2,2 6.5,2 4.5
            ))'
) AS geom1,
        ST_GeomFromText('LINESTRING(-0.62 5.84,-0.8 0.59)') AS geom2
) AS g1,
  ST_Dump(ST_Split(g1.geom1, g1.geom2)) AS gd"""
example_11_2_2 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom") #Note change of geom_col!
example_11_2_2.head()

*Out[35]:*

	index	geom
0	1	POLYGON ((2.00000 4.50000, 3.00000 2.60000, 3….
1	2	POLYGON ((-0.76073 1.73532, -1.50000 2.20000, …
2	3	POLYGON ((-0.69361 3.69315, -1.50000 4.20000, …

*In[36]:*

example_11_2_2.plot()

The ST_Split(A,B) function always returns a geometry collection consisting of all parts of geometry A that result from splitting it with geometry B, even when the result is a single geometry.

Because of the inconvenience of geometry collections, you’ll often see ST_Split combined with ST_Dump, as in the above listing, or with ST_CollectionExtract to simplify down to a single geometry where possible.

Tessellating

Textbook Chapter/Section: 11.2.3
Textbook Start Page: 318

Dividing your polygon into regions using shapes such as rectangles, hexagons, and triangles is called tessellating. It’s often desirable to divide your regions into areas that have equal area or population for easier statistical analysis. In this section, we’ll demonstrate techniques for achieving equal-area regions.

Tessellating looks like this:

$Tessellating$

CREATING A GRID AND SLICING TABLE GEOMETRIES WITH THE GRID

In this example, you’ll slice the New York city boroughs into small rectangular blocks. Your map will look like this:

We will first start off with ST_SquareGrid. ST_SquareGrid is a set returning function that returns a table consisting of 3 columns (i,j,geom). The i is the row number along the grid and j is the column number along the grid.

*In[40]:*

sql = """
WITH bounds AS (
     SELECT ST_SetSRID(ST_Extent(geom), ST_SRID(geom)) AS geom -- Creates a bounding box geometry covering the extent of the boroughs
        FROM ch11.boroughs
    GROUP BY ST_SRID(geom)
 ),
 grid AS (SELECT g.i, g.j, g.geom
          FROM bounds, ST_SquareGrid(10000,bounds.geom) AS g
          )                                                    -- Creates a square grid each square grid is of 10000 length/width units (feet)
 SELECT b.boroname, grid.i, grid.j,
      CASE WHEN ST_Covers(b.geom,grid.geom) THEN grid.geom
      ELSE ST_Intersection(b.geom, grid.geom) END AS geom      -- Clips the boroughs by the squares
 FROM ch11.boroughs AS b
     INNER JOIN grid ON ST_Intersects(b.geom, grid.geom);
"""

example_11_2_3_1 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom") #Note change of geom_col! example_11_2_3_1.head()

*Out[40]:*

	boroname	i	j	geom
0	Staten Island	91	11	POLYGON ((920000.000 117907.700, 918916.902 11…
1	Staten Island	91	12	POLYGON ((912498.141 120000.000, 912287.069 12…
2	Staten Island	91	13	POLYGON ((915779.150 130000.000, 915731.347 13…
3	Staten Island	91	14	POLYGON ((916233.310 140000.000, 917776.001 14…
4	Staten Island	92	11	POLYGON ((930000.000 116834.215, 928239.005 11…

*In[38]:*

example_11_2_3_1.plot()

*In[39]:*

#Add to db
example_11_2_3_1.to_postgis(name="boroughs_square_grid", schema="ch11", con=engine)

The above code uses a grid where each square is 10,000 units in length/width and spans the full extent of NYC boroughs. Since the boroughs are in NY State plane feet (SRID=2263), the units of measure are square feet. When you clip the bouroughs by the grid, the resulting tiles have various shapes and sizes when they are on bourough boundaries. This may not be ideal if you want all your tiles to be the same size. In those cases you may want to forgo the clipping and just return the squares as is. Note that <3> uses a CASE statement. This is equivalent in result to just doing ST_Intersection, but because ST_Intersection is an intensive operation, you save a lot of processing cycles by just returning the square if it is completely covered by the borough.

The above code defined a bounds that covered the whole area of interest and split that into squares and then clipped the geometries using those squares. One feature of the ST_SquareGrid function that is not obvious from the above code is that for any given SRID and size there is a unique dicing of grids that can be formed across all space. Which means for any given SRID and size a particular point will have exactly the same i,j, geom tile it intersects in.

*In[41]:*

#Dividing the NYC boroughs bounds into rectangular blocks
#This code creates a square grid each square grid is of 10000 sq units (feet)
sql = """
SELECT b.boroname, grid.i, grid.j,
    CASE WHEN ST_Covers(b.geom,grid.geom) THEN grid.geom
      ELSE ST_Intersection(b.geom, grid.geom) END AS geom             -- Creates a bounding box geometry covering the extent of the boroughs
 FROM ch11.boroughs AS b
     INNER JOIN ST_SquareGrid(10000,b.geom) AS grid                   -- Clips the boroughs by the squares
      ON ST_Intersects(b.geom, grid.geom);
"""

example_11_2_3_2 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom") #Note change of geom_col! example_11_2_3_2.head()

*Out[41]:*

	boroname	i	j	geom
0	Staten Island	91	11	POLYGON ((920000.000 117907.700, 918916.902 11…
1	Staten Island	91	12	POLYGON ((912498.141 120000.000, 912287.069 12…
2	Staten Island	91	13	POLYGON ((915779.150 130000.000, 915731.347 13…
3	Staten Island	91	14	POLYGON ((916233.310 140000.000, 917776.001 14…
4	Staten Island	92	11	POLYGON ((930000.000 116834.215, 928239.005 11…

*In[42]:*

example_11_2_3_2.plot() #Note it is the same result as before

*In[43]:*

#Add to db
example_11_2_3_2.to_postgis(name="boroughs_square_grid2", schema="ch11", con=engine)

If this is a grid you’ll be using often, it’s best to make a physical table (materialize it) out of the bounds and use it to chop up your area as needed. However if this is one off chunking for specific areas or your area is huge and you want to do it in bits, then it is better to follow the listing 11.5 model.

Often times a hexagonal grid works better for splitting space evenly as each hexagon has 6 neighbors it is equidistant to. This is not the case with squares. A popular gridding system for the world is the H3 scheme created by Uber for dividing driving regions. It uses hexagons eng.uber.com/h3/ to divide the earth and a dymaxion projection en.wikipedia.org/wiki/Dymaxion_map. This has the feature of allowing large hexagons to more or less contain smaller ones.

Sharding

Textbook Chapter/Section: 11.2.3
Textbook Start Page: 327

One common reason for breaking a geometry into smaller bits is for performance. The reason is that operations such as intersections and intersects work much faster on geometries with fewer points or smaller area. If you have such a need, the fastest sharding function is the ST_Subdivide(postgis.net/docs/ST_Subdivide.html) function. ST_Subdivide is a set returning function that returns a set of geometries where no geometry has more than max_vertices designated. It can work with areal, linear, and point geometries.

This next example divides Queens such that no polygon has more than 1/4th total vertices of original.

*In[44]:*

sql = """
SELECT row_number() OVER() AS bucket,x.geom,
  ST_Area(x.geom) AS area,
  ST_NPoints(x.geom) AS npoints,
  (ref.npoints/4)::integer As max_vertices
FROM (SELECT geom,                                             -- Geometry to slice
        ST_NPoints(geom) AS npoints
      FROM ch11.boroughs WHERE boroname = 'Queens') AS ref
         , LATERAL ST_Subdivide(ref.geom,
                  (ref.npoints/4)::integer                     -- Max number shards
                  ) AS x(geom) ;
"""

example_11_2_3_3 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom") #Note change of geom_col! example_11_2_3_3.head()

*Out[44]:*

	bucket	geom	area	npoints	max_vertices
0	1	POLYGON ((1016187.550 141290.140, 1011914.750 …	3.266832e+08	26	239
1	2	POLYGON ((1039907.726 150741.984, 1034839.909 …	5.422707e+08	73	239
2	3	POLYGON ((1048784.526 167596.255, 1048727.701 …	2.061526e+08	166	239
3	4	POLYGON ((1059790.715 184428.326, 1059759.660 …	2.899369e+08	88	239
4	5	POLYGON ((1031087.953 184428.326, 1022870.118 …	5.098495e+08	161	239

*In[45]:*

example_11_2_3_3.plot()

Here is a color-coded example of the above plot:

You will often find ST_Subdivide paired with ST_Segmentize. ST_Segmentize is used to even out the segments so that when ST_Subdivide does the work of segmentizing, the areas have a better chance of being equal.

In the above queries with ST_Subdivide use LATERAL join constructs. In the case of set-returning functions, the LATERAL keyword is optional so you will find people often ommitting LATERAL even though they are doing a LATERAL join.

Segmentizing `LINESTRING`s

Textbook Chapter/Section: 11.3.1
Textbook Start Page: 330

There are several reasons why you might want to break up a linestring into segments: - To improve the use of spatial indexes—a smaller linestring will have a smaller bounding box. - To prevent linestrings from stretching beyond one unit measure. - As a step toward topology and routing to determine shared edges.

If you have long linestrings where the vertices are fairly far apart, you can inject intermediary points using the ST_Segmentize function. ST_Segmentize adds points to your linestring to make sure that no individual segments in the linestring exceed a given length. ST_Segmentize exists for both geometry and geography types.

For the geometry version of ST_Segmentize, the measurement specified with the max length is in the units of the spatial reference system. For geography, the units are always meters.

In the following listing, you’re segmentizing a 4 vertex linestring into 10,000-meter segments.

*In[46]:*

#This is our linestring
sql = """
SELECT
    geog::geometry AS geom,
    ST_NPoints(geog::geometry) AS np_before,
    ST_NPoints(ST_Segmentize(geog,10000)::geometry) AS np_after
FROM ST_GeogFromText(
    'LINESTRING(-117.16 32.72,-71.06 42.35,3.3974 6.449,120.96 23.70)'
) AS geog;
"""

example_11_3_1_1 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom") #Note change of geom_col! example_11_3_1_1.head()

*Out[46]:*

	geom	np_before	np_after
0	LINESTRING (-117.16000 32.72000, -71.06000 42….	4	3585

geom

np_before

np_after

LINESTRING (-117.16000 32.72000, -71.06000 42….

3585

*In[47]:*

example_11_3_1_1.plot()

In this listing, you start with a 4-point linestring. After segmentizing, you end up with a 3,585-point linestring where the distance between any two adjacent points is no more than 10,000 meters. You cast the geography object to a geometry to use the ST_NPoints function. The ST_NPoints function does not exist for geography.

As mentioned earlier, ST_Subdivide is often used with ST_Segmentize. This is because ST_Subdivide cuts at vertices, so if you wanted to then break apart your linestring into separate linestrings, you can use ST_Subdivide, but need to first cast to geometry and then back to geography as follows:

*In[49]:*

#This is our linestring
sql = """
SELECT
    sd.geom::geography AS geo,
    ST_NPoints(sd.geom) AS np_after
FROM ST_GeogFromText(
    'LINESTRING(-117.16 32.72,-71.06 42.35,3.3974 6.449,120.96 23.70)'
) AS geog,
  LATERAL ST_Subdivide( ST_Segmentize(geog,10000)::geometry,
                  3585/8) AS sd(geom);
"""

example_11_3_1_2 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geo") #Note change of geom_col! example_11_3_1_2.head()

*Out[49]:*

	geo	np_after
0	LINESTRING (-117.16000 32.72000, -117.08387 32…	270
1	LINESTRING (-94.55382 40.08841, -94.49497 40.1…	236
2	LINESTRING (-71.94764 42.35359, -71.84901 42.3…	152
3	LINESTRING (-57.63000 40.52939, -57.54303 40.5…	367
4	LINESTRING (-27.86500 29.73958, -27.81249 29.7…	230

*In[50]:*

example_11_3_1_2.plot()

Scaling

Textbook Chapter/Section: 11.4.2
Textbook Start Page: 337

The scaling family of functions comes in four overloads, one for 2D ST_Scale(geometry, xfactor, yfactor), one for 3D ST_Scale(geometry, xfactor, yfactor, zfactor), one for any geometry dimension but will scales the coordinates based on a factor specified as a point ST_Scale(geometry, factor), and the newest addition introduced in PostGIS 2.5 is a version that will scale same amount but about a specfied point ST_Scale(geom, factor, geometry origin). (postgis.net/docs/ST_Scale.html).

Scaling takes a coordinate and multiplies it by the factor parameters. If you pass in a factor between 1 and -1, you shrink the geometry. If you pass in negative factors, the geometry will flip in addition to scaling. Below shows an example of scaling a hexagon.

*In[68]:*

sql = """
SELECT
    xfactor, yfactor, zfactor, geom,                                                                  -- Scale in x and y direction
    ST_Scale(hex.geom, xfactor, yfactor) AS scaled_geometry,                                          -- Scaling values
    ST_Scale(hex.geom, ST_MakePoint(xfactor,yfactor, zfactor) ) AS scaled_using_pfactor               -- Scale using point factor
FROM (
        SELECT ST_GeomFromText(
            'POLYGON((0 0,64 64,64 128,0 192, -64 128,-64 64,0 0))'
        ) AS geom                                                                                     -- Original hexagon
    ) AS hex
    CROSS JOIN
    (SELECT x*0.5 AS xfactor FROM generate_series(1,4) AS x) AS xf
    CROSS JOIN
    (SELECT y*0.5 AS yfactor FROM generate_series(1,4) AS y) AS yf
    CROSS JOIN
    (SELECT z*0.5 AS zfactor FROM generate_series(0,1) AS z) AS zf;
"""
hexagon_orig = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom") #Note change of geom_col!
hexagon_orig.head()

*Out[68]:*

	xfactor	yfactor	zfactor	geom	scaled_geometry	scaled_using_pfactor
0	0.5	0.5	0.0	POLYGON ((0.000 0.000, 64.000 64.000, 64.000 1…	0103000000010000000700000000000000000000000000…	0103000000010000000700000000000000000000000000…
1	1.0	0.5	0.0	POLYGON ((0.000 0.000, 64.000 64.000, 64.000 1…	0103000000010000000700000000000000000000000000…	0103000000010000000700000000000000000000000000…
2	1.5	0.5	0.0	POLYGON ((0.000 0.000, 64.000 64.000, 64.000 1…	0103000000010000000700000000000000000000000000…	0103000000010000000700000000000000000000000000…
3	2.0	0.5	0.0	POLYGON ((0.000 0.000, 64.000 64.000, 64.000 1…	0103000000010000000700000000000000000000000000…	0103000000010000000700000000000000000000000000…
4	0.5	0.5	0.5	POLYGON ((0.000 0.000, 64.000 64.000, 64.000 1…	0103000000010000000700000000000000000000000000…	0103000000010000000700000000000000000000000000…

*In[69]:*

#Original Hexagon
hexagon_orig.plot()

*In[70]:*

#Scaled Hexagon
hexagon_scaled = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="scaled_geometry") #Note change of geom_col!
hexagon_scaled.plot()

*In[71]:*

#Scaled Hexagon Point Factor
hexagon_point = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="scaled_using_pfactor") #Note change of geom_col!
hexagon_point.plot()

You start with a hexagonal polygon #1 and shrink and expand the geometry in the X and Y directions from 50% of its size to twice its size by using a cross join that generates numbers from 0 to 2 in X and 0 to 2 in Y, incrementing .5 for each step #2. The results are shown below:

The scaling multiplies the coordinates. Because the hexagon starts at the origin, all scaled geometries still have their bases at the origin. Normally when you scale, you want to keep the centroid constant. See below for an exampe:

*In[72]:*

sql = """
SELECT xfactor, yfactor,
        ST_Scale(hex.geom, ST_MakePoint(xfactor, yfactor), ST_Centroid(hex.geom) )
        AS scaled_geometry
FROM (
        SELECT ST_GeomFromText(
            'POLYGON((0 0,64 64,64 128,0 192,-64 128, -64 64,0 0))'
        ) AS geom
    ) AS hex
    CROSS JOIN
    (SELECT x*0.5 AS xfactor FROM generate_series(1,4) AS x) AS xf
    CROSS JOIN
    (SELECT y*0.5 AS yfactor FROM generate_series(1,4) AS y) AS yf;
"""
centroid = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="scaled_geometry") #Note change of geom_col!
centroid.plot()

Here you scale a hexagon from half to twice its size in the X and Y directions about the centroid so the centroid remains unaltered. See below:

Rotating

Textbook Chapter/Section: 11.4.3
Textbook Start Page: 340

ST_RotateX, ST_RotateY, ST_RotateZ, and ST_Rotate rotate a geometry about the X, Y, or Z axis in radian units. ST_Rotate and ST_RotateZ are the same because the default axis of rotation is Z. ST_RotateX, ST_RotateY, ST_RotateZ, and ST_Rotate rotate a geometry about the X, Y, or Z axis in radian units. ST_Rotate and ST_RotateZ are the same because the default axis of rotation is Z.

These functions are rarely used in isolation because their default behavior is to rotate the geometry about the (0,0) origin rather than about the centroid. You can pass in an optional point argument called pointOrigin. When this argument is specified, rotation is about that point; otherwise, rotation is about the origin.

The following listing rotates a hexagon about its centroid in increments of 45 degrees.

*In[78]:*

#This is our original hexagon
sql = """
SELECT
    hex.geom as orig_geom,
    rotrad/pi()*180 AS deg,
    ST_Rotate(hex.geom,rotrad,
    ST_Centroid(hex.geom)) AS rotated_geometry
FROM (
        SELECT ST_GeomFromText(
            'POLYGON((0 0,64 64,64 128,0 192,-64 128,-64 64,0 0))'
        ) AS geom
    ) AS hex
CROSS JOIN (
        SELECT 2*pi()*x*45.0/360 AS rotrad
            FROM generate_series(0,6) AS x
) AS xf;
"""
orig_item = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="orig_geom") #Note change of geom_col!
orig_item.plot()

*In[80]:*

rotate = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="rotated_geometry") #Note change of geom_col!
rotate.plot()

The above plot looks like a big blob…here is a better representation:

Geospatial Analytics using Python and SQL

A Crash Course in PostGIS

Required Packages

Load Data #1

Connect to Postgres

Review: Accessing our Postgres Data

Agenda

Creating a single MULTIPOLYGON from many

Creating a LINESTRING from `POINT`s

Clipping

Example: What’s left of the polygon and line after clipping

Splitting

Tessellating

CREATING A GRID AND SLICING TABLE GEOMETRIES WITH THE GRID

Sharding

Segmentizing `LINESTRING`s

Scaling

Rotating

Creating a single `MULTIPOLYGON` from many

Creating a `LINESTRING` from `POINT`s