dccps logo
Geographic Information System
for Breast Cancer Studies on Long Island (LI GIS)

Long Island Breast Cancer Study and the GIS-H

There are 48 slides to this presentation.

Slide 1:

Long Island Breast Cancer Study and the GIS-H (Health)
Presentation by Edward J. Trapido
Associate Director, Epidemiology and Genetics Research Program
Division of Cancer Control and Population Sciences
National Cancer Institute
For the Comprehensive Approaches to Cancer Control Conference
September 2003, Atlanta, Georgia

Slide 2:

This slide shows a color-coded map of the United States (lower 48 states) that indicates  patterns of breast cancer death rates (per 100,000 population) for 1970 to 1992 for white women by state economic area.  The northeastern United States, including Long Island (Nassau and Suffolk counties), had elevated rates of breast cancer over this period. 

The map was created by Susan S. DeVesa and Dan J. Grauman for the paper “Geographic Variation in U.S. Breast Cancer Death Rates,” Journal of the National Cancer Institute 1995; 87:1846-1853.

Slide 3:

Long Island Breast Cancer Study Project

  • Grew out of community’s concern
  • A multistudy investigation of  environmental factors and breast cancer
  • NCI developed the GIS-H in response to a law passed in 1993.

Slide 4:

Public Law 103-43,

June 10, 1993

  • “The director of the NCI. . .shall conduct a case-control study [of] factors contributing to the incidence of breast cancer in:
    • The counties of Nassau and Suffolk, and
    • The 2 counties in the northeastern U.S. [that] had the highest age-adjusted mortality rate of such cancer. . .”

Slide 5:

Public Law 103-43,

GIS Requirement

Certain elements of the study...shall include the use of a geographic system to evaluate the current and past exposure of individuals, including direct monitoring and cumulative estimates of exposure, to:

  1. contaminated drinking water
  2. sources of indoor and ambient air pollution, including emissions from aircraft
  3. electromagnetic fields
  4. pesticides, and other toxic chemicals
  5. hazardous and municipal waste
  6. other factors as appropriate.

Slide 6:

A Tool for Studying Environment & Breast Cancer

This graphic is an illustration of how the GIS-H can combine historical, human, and environmental data into map layers for study.

Slide 7:

Geographic Extent of the GIS-H

  • Nassau and Suffolk counties  - detailed health, demographic, environmental data.
  • Buffer counties within 50 km - additional environmental data (data available with less precision and detail)
  • Extended area within100 miles of mid-point of counties' boundary line (limited data).

Slide 8:

This slide shows a screen shot of the GIS-H home page.

Slide 9:


Two Sections

  • Public Use
  • Researcher’s Use.

Slide 10:

Levels of Access

  • Public
    • Public data
  • Secure (for researchers)
    • All public data
    • Protocol restricted data
      • Requires approval for each researcher and project.

Slide 11:

Public Use

The graphic is to illustrate how the layers in a GIS are an abstraction of reality.  Each layer is one element of what is actually on the ground in real life.  For example, there is a road layer, environmental layers, and governmental layers.  There are an infinite number of data sources in reality to quantify and add into the GIS as a map layer.

ArcExplorer allows the public to:

  • create their own maps using publicly available data
  • use additional interactive features and flexibility, including unique combinations of layers.

Slide 12:

Public Use

  • 16 interactive maps with up to 9 environmental exposure layers.
  • Each will be on the public website.
  • Map topics and exposures numbers reflect the interests and concerns of community members.

Slide 13:

For Researchers

Enable researchers to:

  • Explore and synthesize available information on potential exposures
  • Generate hypotheses
  • Identify spatial and temporal clusters of disease
  • Evaluate risk factors for breast cancer and other health outcomes (with your addition of data)
  • Address methodological issues
  • Identify gaps in available information.

Slide 14:

What Questions Can Be Addressed?

  • What are the rates of breast cancer in the community (overall, in smaller areas)?
  • Can we identify clusters of cases, or areas with significantly higher rates?
  • Where might exposures of interest (to scientists, to the community) come from?
  • Are there correlations -- spatial relationships -- between disease and potential exposures?
  • More sophisticated: Are potential environmental exposures linked with breast cancer, taking other factors into account?

Slide 15:

Data Included in the GIS-H

  • Geospatial
  • Demographic and Behavioral
  • Health
  • Environmental.

Slide 16:


Base Maps:

  •  Cadastral* data (tax lots, parcels)
  • Political boundaries
  • Roads
  • Railroads
  • Hydrology (water supply, rivers, streams)
  • Aerial photography and satellite imagery.

* showing property boundaries, subdivision levels, etc.

Slide 17:

This slide is screen shot of one of the interactive maps that is under development for the public Web site.  Overlaid on a map of Long Island are different colors representing different types of land use as categorized by the U.S. Geological Survey.

Slide 18:

Demographic and Lifestyle

  • Census Data:
    • Counts of the population
    • Descriptive information about individuals
      • Age, Race, Gender, Income groupings
    • Households
      • Type and age of housing
      • Rural or urban
  • National Nutritional Health and Lifestyle Survey.

Slide 19:

This slide is a screen shot from the GIS-H Web public site that illustrates that different data sources (geospatial, demographic/behavioral, health, and environmental) have metadata to Federal Geographic Data Committee (FGDC) standards.

Slide 20:

This screen shot is of one of the interactive maps under development for the public Web site.  The map shows Long Island with layers representing 1990 income demographics, breast cancer incidence by Zip Code, and county boundaries.

Slide 21:


Medical outcomes

  • State Cancer Registry (yes and no)
    • Rates by Zip Code available for 1993-97
    • Others available from registry
  • Medicare
  • Hospital discharges
  • Medical facilities.

Slide 22:

This screen shot is of one of the interactive maps under development for the public Web site.  Suffolk County is shown with layers representing breast cancer incidence by Zip Code boundaries and with the location of health facilities and mammography clinics pinpointed.

Slide 23:


  • Air quality monitoring results
  • Drinking water analysis and water use
  • Industrial sites, industrial releases, and hazardous materials
  • Radioactive sites or materials
  • Land use and land cover
  • Traffic volume
  • Weather and climate information
  • Other: weather, satellite image maps.

Slide 24:

This screen shot is of one of the interactive maps under development for the public Web site and shows an area of Long Island with the following layers of data represented by symbols pinpointing the locations:  tetrachloroethene detection by place, chloroterephthalic acid detection by place, TRI Waste Release, gas stations, Suffolk Well VOC Detection. 

Slide 25:

Data Sources

  • County Water Authorities and Departments of Health
  • State Departments of Environmental Conservation, Health, Labor, and Public Service
  • Federal Centers for Disease Control, National Center for Health Statistics, Environmental Protection Agency, Nuclear Regulatory Commission, Geological Survey, Census, Department of Agriculture.

Slide 26:


  • Need to understand data limitations
  • For each dataset, information on…
    • Identification
    • Data quality
    • Spatial data organization
    • Spatial reference information
    • Entity and attribute overview
    • Distribution.

Slide 27:

This screen shot shows the Metadata browser for the GIS-H.  It is a tree structure view of the organization of the metadata within the system.

Slide 28:

Researcher’s Toolbox

  • ArcView, Spatial Analyst and 3D Analyst
  • Extensions developed especially for GIS-H
    • Add Database Theme and Table Tools
    • Case File Formatter
    • Data query wizard
    • Disease Rate Calculator (graphic)
    • Areal Interpolator  (graphic)
    • Cluster Analysis Tool (to facilitate using SaTScan)
    • Empirical Bayes Tool
    • Geographic masking.

Slide 29:

Researcher’s Toolbox

  • Additional software for researchers' use
    • Adobe Photoshop
    • ArcInfo
    • SAS
    • S-Plus
    • WinBUGS
  • User's guide.

Slide 30:

Disease Rate Calculator

This screen shot illustrates that a custom extension has been developed to calculate directly-adjusted rates for selected census tracts.  (The extension can be freely downloaded from the public Web site.)

Slide 31:

Areal Interpolator

This screen shot illustrates that a custom extension has been developed to interpolate Zip Code population from census tract population.  (The extension can be freely downloaded from the public Web site.)

Slide 32:

Cluster Analysis

This screen shot illustrates that a custom extension has been developed to check for checking for clusters of sample cases.  It uses SaTScan software as cluster analysis engine.  (The extension can be freely downloaded from the public Web site.)

Slide 33:

Geographic Masking

This screen shot illustrates that a custom extension has been developed to mask selected sample cases using random perturbation method.  (The extention can be freely downloaded from the public Web site.)

Slide 34:

This slide is the first in several slides that show examples of the interactive maps under development for the public Web site.

A screen shot is shown with a map of Long Island.  On the right, there is a table from which to choose data on Suffolk County Air Monitoring Sites to display.  Selected and represented are:  AIRS Benzene Monitoring, AIRS Benzo(a)pyrene Monitoring, Suffolk Benzene Air Monitoring, Suffolk Chlorobenzene Air Monitoring, Suffolk Toluene Air Monitoring,  Suffolk m-Xylene Air Monitoring, Suffolk o-Xylene Air Monitoring, County Boundaries, and Breast Cancer Incidence by Zip Code.

Slide 35:

This screen shot shows the layer with Breast Cancer Incidence by Zip Code and focuses on Oyster Bay.  The TRI Arsenic Air Release data layer is activated, and in a window, one can view some data available for the Oyster Bay location. 

Slide 36:

This screen shot focuses on Oyster Bay again.  Represented this time are TRI Arsenic Release data and Breast Cancer Incidence by Zip Code.

Slide 37:

This slide is a screen shot from the GIS-H public Web site of the Data Warehouse Directory.  Specifically depicted is the page for TRI Air and Land Data, which shows the data sources that are aggregated into the TRI facts.

Slide 38:

This example is a screen shot showing the layer with Breast Cancer Incidence Rates by Zip Code and the location of inactive hazardous waste sites.  The Syosset Landfill is selected, and a window shows the data available for the landfill.

Slide 39:

This screen shot shows the layer with Breast Cancer Incidence Rates by Zip Code, TRI Facilities – Air Release-Arsenic, and buffer zones of 0.5, 1.0., and 1.5 miles around the TRI sites.

Slide 40:

This map shows how it is possible to specify from a table on the right population data by race/ethnicity.

Slide 41:

In this screen shot, the map layers selected are for Active Hazardous Waste Sites, Inactive Hazardous Waste Sites, EPA Hazardous Waste Sites, and Breast Cancer Incidence by Zip Code.

Slide 42:

In this screen shot, one layer that was represented in the previous slide is eliminated:  EPA Hazardous Waste Sites.

Slide 43:

A screen shot is shown of what might be a finished product map.  It has a layer depicting breast cancer incidence rates by Zip Code, a layer with waste sites, and buffers around waste sites.  Such a map might be used to explore the question:  Are Disease Rates “Near” Waste Sites the Same as in Areas Outside “Buffers”?  The answer is not entirely clear, although there may be something worth investigating further.

Slide 44:

What is Availability of GIS-H?

  • Available now to researchers with approved projects
  • Public mapping features available soon.

Slide 45:

Important Issues

  • Data are imperfect
    • Examples:  addresses, sparse data, data collected for other purposes
    • Potential exposure not necessarily actual exposure
    • Time frame and latency of cancer
    • Substitutions and additions may be recommended as we go along
  • The website will not include software to keep
  • The eye is not a good analytic tool
  • Confidentiality

Slide 46:

This is a screen shot from the Researchers section on the GIS-H public Web site. It shows where researchers can come to obtain information on applying to use the GIS-H.

Slide 47:

In Summary, the GIS-H is…

  • Comprehensive, integrated data warehouse (> 80 datasets)
  • Flexible and expandable
  • Can integrate external datasets
  • Sophisticated Researcher’s Toolbox
  • Community input and access
  • Systematic attempt to include high quality data, comprehensive metadata
  • A prototype and resource for future studies.

Slide 48:


  • Access to researcher site is limited to investigators with approved protocols
  • For additional information, visit the GIS-H web site.

Slide 49:

Thank You!