Most image files do not just contain a picture. They also contain information (metadata) about the picture. Metadata provides information about a picture's pedigree, including the type of camera used, color space information, and application notes.
Finding Metadata
Different picture formats include different types of metadata. Some formats, like BMP, PPM, and PBM contain very little information beyond the image dimensions and color space. In contrast, a JPEG from a camera usually contains a wide variety of information, including the camera's make and model, focal and aperture information, and timestamps.
PNG files typically contain very little information... unless the image was converted from a JPEG or edited with Photoshop. Converted PNG files may include metadata from the source file format. Similarly, WebP files are nearly always converted from another file format. Many converters omit metadata, and WebP does not have facilities for storing all of the various metadata fields found in JPEG images.
Extracting Metadata
Viewing metadata requires extracting the information from the file. There are plenty of open source, free, and commercial solutions available. Some of these only support one file type (e.g., JPEG-only), while others support many file formats. In addition, different programs may support different types of metadata.
A few examples of available image metadata tools:
Exiv2 is an open source tool that decodes Exif, IPTC, and XMP metadata. (See Metadata Types for a description of these metadata records.) This command-line utility is provided as an executable for Windows, or source code for Linux and Mac.
ExifTool by Phil Harvey is one of the most powerful command-line metadata extraction tools available. It supports hundreds of different file and metadata formats -- including many that are manufacturer-specific. The entire tool is well-documented and written in Perl; it works very well under Linux and Mac, but could be difficult for Windows users who do not have Perl installed.
Adobe Photoshop is a commercial application that includes an XMP viewer. In Photoshop CS5, it is under File → File Info. While not as powerful or as complete as Exiv2 and ExifTool, Adobe's viewer does provide the ability to decode XMP, IPTC, Exif, and other types of metadata in a graphical interface.
Preview Inspector. The default Apple Mac OS X picture viewer is called Preview. Preview contains an 'Inspector' to view metadata. This tool displays a small fraction of the available metadata and can provide misleading analysis results. Do not use Preview Inspector for any official work.
Microsoft Windows Photo Viewer. The default photo viewer under Windows 7 and Windows 8 contains a 'properties' option that lists metadata. However, this tool displays fictional metadata fields that do not exist in the file, omits most fields that do exist, rewrites some metadata values, and renames some of the fields that it displays. Do not use Windows Photo Viewer for any metadata analysis.
There are also plenty of online web sites where you can upload a picture to see the metadata. Virtually all of them use ExifTool or Exiv2 as the back-end data extractor.
FotoForensics uses ExifTool for metadata extraction. In some situations, the ExifTool results are augmented with additional information identified by tools provided by Hacker Factor.
Evaluating Metadata
Metadata provides information related to how the file was generated and handled. This information can be used to identify if the metadata appears to be from a digital camera, processed by a graphical program, or altered to convey misleading information. Common things to look for include:
Make, Model, and Software: These identify the device or application that created the picture. Most digital cameras include a Make and Model in the EXIF metadata block. (However, the original iPhone does not!) The Software may describe the camera's firmware version or application information.
Image size: The metadata often records the picture's dimensions. Does the rendered image size (listed at the bottom of the metadata) match the other sizes in the metadata? Many applications resize or crop pictures without updating other metadata fields.
Timestamps: Look for fields that detail timestamps. These typically identify when a picture was taken or altered. Do the timestamps match the expected timeframe?
Types of metadata: There are many different metadata types. Some are only generated by cameras, while others are only generated by applications.
Descriptions: Many pictures include embedded annotations that describe the photo, identify the photographer, or itemize alteration steps.
Missing metadata: Are any metadata fields missing? If the picture came from a digital camera, then it should have camera-specific information. Some applications and online services strip out metadata. A lack of specific metadata usually indicates a resaved picture and not an original photo.
Altered Metadata
Metadata is analogous to the chain of custody for evidence handing. It can identify how a picture was generated, processed, and last saved. However, some people intentionally alter metadata. They may edit timestamps or photo information in an attempt to deceive.
Fortunately, analysts have a few things in their favor.
Intentional deceptions are very uncommon. Unless you have a reason to suspect altered metadata, it is usually safe to assume that the metadata is accurate.
Altering metadata requires specialized tools and technical skill. Typical users lack this technical requirement. And if they do alter the metadata, then they are liable to forget to alter all of the related metadata fields. (With many pictures, altering a timestamp is not as simple as editing a binary file and changing the year data from "2015" to "2016".)
Applications that alter metadata typically alter the picture, either through a resave or re-encoding. This can be detectable, even if the metadata appears accurate. Remember: metadata is only one analysis technique. Deceptions can be identified by using multiple forensic analysis methods.
Updated metadata from using an application is much more common than intentional deception. In this regard, evaluating metadata can help identify how a picture was postprocessed and handled.
Advanced Analysis
Although metadata does not identify the exact changes made to the picture, it can be used to identify attributes, inconsistencies, additional sources, edits, timelines, and a rough sense of how the image was managed. In effect, metadata provides clues about an image's pedigree.
There is no standard for a required set of metadata to exist in any particular picture. However, known tools generate known metadata fields; some types of metadata are generated during a save and others are appended to existing data. Some may be updated, while others may be retained or removed. By understanding the metadata and when it can appear, an investigator can develop a timeline and identify the order of changes made to the file.
Caveats
Metadata is an invisible component to a picture. While not required for rendering the picture, it does exist in the image file. Although metadata can provide a wealth of information about a photo, it also has a number of limitations. These include:
Timestamps. Many cameras and computers do not regularly synchronize clocks. As a result, times can drift. If the camera is transported across time zones, then the local time can be significantly different from the camera's clock. Also, it is common for metadata timestamps to omit time zone information.
Misleading. Metadata consists of field-value pairs. The field defines the value's purpose. Don't be confused by a field that is vendor-specific. For example, Kodak defined a large number of metadata extensions. Each of these records will have "Kodak" in the field name. That only means that the format was defined by Kodak, not that the data came from a Kodak camera. The value of these fields may identify a non-Kodak system.
Inconsistent. Not all manufacturers follow the industry standards. Many Kodak cameras set the 'Make' to 'Digital Camera' rather than 'Kodak'. Some manufacturers use the same firmware on multiple camera models, so the 'Model' may contain a list of cameras (and the list may not even include the actual camera model). A few camera models leave all of this information blank. Since many cameras have unique quirks, an inconsistent EXIF field may be due to a specific camera make/model and not an indication of tampering or modification.
Stripped. Resaving an image may strip some or all of the metadata information. Some people strip metadata to obscure the image's origins. However, this really raises questions when a picture is supposed to be original, or has been processed by an application that should always leave metadata. In addition, stripping metadata can impact how the image is rendered.
Hosting. Many online picture sharing sites strip out metadata, including Facebook, Twitter, and Imgur. Other sites, like Photobucket and Google's Picasa, may alter the metadata. Even if the picture was unaltered when it was uploaded, the hosting site may have stripped out metadata. As a result, the metadata may not identify if it came from a digital camera.
Faked. Although uncommon, people who want to create fake photos (such as pictures of UFOs, ghosts, and dead celebrities) may attempt to edit the metadata. Since many metadata fields are plain text and not part of any cryptographic checksum, simple edits to the metadata may not be detectable.
Residues. Adobe has a known problem with metadata. If you load a photo into Photoshop, paste over it with a different photo, and save it, then the new photo will retain the original metadata. This can generate misleading information since the metadata will not match the new picture.
Scanners. Pictures from scanners may appear to come from a camera or an application, depending on how it was scanned. If the picture was captured using internal firmware from a standalone scanner, then it will appear more like a camera. However, scanners accessed through a computer program, such as Photoshop, will produce images that appear to come from an application.
Metadata analysis is one of many different types of analysis. The interpretation of results from any single analysis method may be inconclusive. It is important to validate findings with other analysis techniques and algorithms.
Metadata Types
There are many different types of metadata. Some types are only generated by cameras. Other types are created by specific applications. And a few types of metadata can come from anywhere. The most common types of metadata you are likely to encounter include File, EXIF, Maker Notes, IPTC, ICC Profiles, XMP, PrintIM, and Photoshop. However, there are many other types of metadata records.
Common Metadata Blocks and Likely Sources
From Cameras
From Applications
From Either
MakerNotes
PrintIM
ICC Profile
IPTC
Photoshop
XMP
File
EXIF
JFIF
APP14
Photoshop: Don't confuse the name
The metadata block named "Photoshop" is not the same thing as the Photoshop application by Adobe.
While there is a common metadata block named 'Photoshop', this specific metadata block can come from many sources and not just Adobe Photoshop. As far as I can tell, Adobe defined the metadata block and named it after the application. However, Adobe then released it as a public standard that is independent of the application.
Some applications and post-processing systems on smartphones include a 'Photoshop' metadata block that contains a minimal amount of information. For example, Apple's iOS (iPhones, iPad, etc.) often add a minimal 'Photoshop' metadata block when the picture is edited or attached to an email. In contrast, when real Adobe Photoshop includes the Photoshop metadata block, it often contains more than a few metadata fields.
File
The file metadata describes the image itself. This includes the type of image (e.g., JPEG or PNG), internal formats, dimensions, and colors. If there is a comment in the file's header, then it is included here. Most digital cameras do not include comments, so the presence of a comment likely indicates that an application processed the image.
EXIF
The Exchangeable Image File format (commonly called Exif or EXIF) is typically used by camera manufacturers to identify information about the camera's settings used for the photo. It typically includes timestamps, camera make/model, lens settings, and more.
For example, this bookshelf picture contains the following EXIF data:
EXIF
Make
Hewlett-Packard
Camera Model Name
HP PhotoSmart 618 (V1.10)
Orientation
Horizontal (normal)
X Resolution
72
Y Resolution
72
Resolution Unit
inches
Y Cb Cr Positioning
Centered
Exposure Time
1/60
F Number
2.7
ISO
100
Exif Version
0210
Date/Time Original
2007:05:28 12:56:08
Components Configuration
Y, Cb, Cr, -
Compressed Bits Per Pixel
1.6
Shutter Speed Value
1/64
Aperture Value
2.8
Exposure Compensation
0
Max Aperture Value
4.0
Subject Distance
0.72 m
Metering Mode
Multi-segment
Light Source
Unknown
Flash
Fired
Focal Length
7.7 mm
Flashpix Version
0100
Color Space
sRGB
Exif Image Width
800
Exif Image Height
600
Image Width
96
Image Height
72
Bits Per Sample
8 8 8
Compression
Uncompressed
Photometric Interpretation
RGB
Strip Offsets
1750
Samples Per Pixel
3
Rows Per Strip
72
Strip Byte Counts
20736
Planar Configuration
Chunky
This data identifies the camera's make (Hewlett-Packard) and model (HP PhotoSmart 618). The photo was taken on 28-May-2007. And the subject (the bookshelf) was about 0.72 meters in front of the camera (about 2 feet 4 inches away).
This metadata also gives information about the image itself. For example, it says it uses the standard RGB (sRGB) color space and should be 800x600. When a picture is cropped or scaled, the metadata may not be updated. If the picture is not 800x600, then at least the dimensions of the prior source are known.
Some cameras, such as smartphones, may also include GPS information as a subset of the Exif record. Although this is rare to come across on the web, if it exists then it will be decoded in the Exif block. (ExifTool will also decode this under a "Composite" heading.)
Maker Notes
Besides the standardized Exif information, cameras may also include manufacturer-specific extensions. In general, pictures that do not originate from a camera will not include Maker Notes.
IPTC
The International Press Telecommunications Council (IPTC) standardized the metadata format used for recording information related to press images. Typically this includes the language set (usually UTF-8) and a version number. However, pictures intended for the mass media, such as those provided by Reuters and Getty Images, will usually include attributions such as the photograph's byline, description, location, and much more.
Most digital cameras do not generate IPTC information. Moreover, few cameras offer a means to enter in the photographer's name, photo description, and other details. (The few cameras that do support it make it extremely difficult to the point that virtually nobody uses this in-camera functionality.)
The presence of IPTC information, particularly with detailed text fields, indicates that the file was modified. At minimum, IPTC information was added by software after the photo was created. While this modification does not indicate that the picture was edited or modified, it does indicate that the file, as a whole, is not a straight-from-the-camera original.
ICC Profile
The International Color Consortium defined a color-space transformation system using a set of ICC Profiles. These are used to ensure that colors display the way they were intended.
Although the color "red" has a specific RGB value (255,0,0), it may display differently on different monitors. This is most apparent at TV stores, when they have a wall of television sets all showing the same thing, but some TVs look brighter than others. The same problem occurs when printing; red on the monitor may not look the same as the red produced by an inkjet printer.
Color profiles are used to convert between the raw RGB values and the intended color tone. Technically, two ICC Profiles are required. The first one converts the raw color to a common colorspace, such as XYZ or L*a*b. The second profile converts from the common space to the display device (monitor or printer). When a picture contains an ICC Profile, it includes the first half of this transformation: converting from the raw color values to the common colorspace. When rendered for the screen or printing, a second color profile is applied -- the one for your monitor or printer. (Of course, this all assumes that you are using sofware that supports color profiles, which is usually not the case. Usually color profiles are just ignored metadata.)
With the exception of a few high-end cameras, cameras do not generate ICC Profiles. ICC Profiles are added by applications as the file is edited or converted. Many applications -- including Adobe Photoshop -- will default to adding a color profile to the picture.
Most profiles are generic and not hardware specific. For example, "Adobe RGB (1998)" is the Adobe standard and implies that an Adobe software package was used, while "IEC 61966" (also called "sRGB") is the standard color profile used by everyone else. (IEC is the International Electrotechnical Commission, a standards organization.)
Besides colorspace information, the ICC Profile will include the primary platform. This typically represents the system used to add in the profile. If you see "Apple Computer", then the image was likely edited on a Mac. If you see "Microsoft", then it likely came from a Windows PC. However, do not assume that "Hewlett-Packard" means an HP-computer was used; many platforms, including non-HP systems, use the HP color profiles.
Copyright 2003 Apple Computer Inc., all rights reserved.
Within the ICC Profile is the "Profile Date Time" field. This indicates when the profile was initially generated. This does not indicate when the profile was attached to the file. The Profile Date Time field must predate the picture's last save. In most cases, the ICC Profile's date predates the photo by years since it was generated long before the photo was captured.
In some cases, you may also see a profile description that includes the type of device, such as "DELL 2001FP" or "iMac" -- suggesting the hardware used by the person who edited the picture. The profile may be from the manufacturer or specifically created when the user tuned the monitor. For example, a Mac user who calibrates the display will create a new Apple ICC Profile with a date that identifies when the recalibration occurred.
Although many graphics programs will add in color profiles, few will alter the color profile without an explicit action by the user. This means that a picture saved using Apple's iPhoto may include an Apple ICC Profile. If Photoshop edits the same picture on a Windows computer, then it will likely retain the Apple ICC Profile and not reflect the Windows system or Adobe software.
XMP
Not willing to use the existing, standard metadata formats, Adobe created their own Extensible Metadata Platform (XMP). This is an XML (text) block that replicates much of the information found in existing Exif and IPTC records.
XMP is almost exclusively generated by Adobe products. (Exiv2 and Apple's iPhoto and Quicktime programs do generate XMP records, but they are nowhere near as extensive as the ones generated by Adobe applications. Also, Exiv2 and Apple XMP records will lack the "Adobe" name in the metadata.) If an XMP record is present, then it will usually contain a large amount of information about the image. This can include:
Exif data. The original Exif data will be replicated here. If the picture was cropped or resized, then there may be a disparity between the Exif, Maker Note, and XMP information.
Tool identification. XMP typically includes the name and version of software that was used to edit the file and when the edits occurred.
History. XMP records may include a summary of modifications, such as a record of each time the file was saved or converted. This does not specify what happened; it only indicates that something happened. A long history of edits implies that the image was manipulated.
Sources. When multiple pictures are combined, the XMP block will record this as multiple sources. This does not indicate what was combined; it only indicates that something happened. In addition, if a source is included but then deleted (or included but not used), then the XMP record will show the additional source but not indicate the removal.
The presence of an XMP block usually indicates a resave by an Adobe product. Adobe products automatically modify images, which can result in rainbowing and sharpening along high-contrast edges. These show up during an Error Level Analysis.
PrintIM
The Epson Print Image Matching (PrintIM) data is a proprietary block that provides color enhancement information for Epson printers. This data plays the same role as an ICC Profile.
The creation of the PrintIM metadata block appears to be exclusive to digital cameras. While some graphics editors will strip out the PrintIM metadata when the image is saved, other programs will retain the metadata. This means that the presence of a PrintIM record strongly suggests that the image originated from a digital camera, but this version of the image may not necessarily be a camera-original image.
Photoshop
Each metadata block has a name. One common metadata block is named "Photoshop", even though this metadata record can be generated by many applications and not just Adobe applications. Seeing a metadata block named "Photoshop" is not an indication that Adobe Photoshop was used on the picture.
Many non-Adobe applications use the Photoshop metadata block to store an IPTC digest (a checksum) or other minimal information. In contrast, Adobe applications typically populate this field with much more detailed information.
Other Records
There are plenty of other types of metadata records. Some typically contain nothing more than a version number (e.g., the JPEG File Interchange Format -- JFIF), others contain thumbnail images, and still others contain additional information about the picture or file format (e.g., 8BIM or Photoshop records).
FotoForensics uses ExifTool to extract metadata. ExifTool generates a "Composite" block at the end of each extraction. The composite is not metadata found in the file. Rather, this is ExifTool's high-level summary. It includes the actual image size as well as camera and GPS information from the metadata.
GPS
Many mobile devices, including smartphones and tablets, are capable of capturing global positioning system (GPS) data and storing it as additional metadata. The GPS information can help identify where the photo was taken.
However, there are some significant limitations associated with GPS metadata.
Although GPS information can be useful for identifying a location, the data is very uncommon.
There are a couple of reasons that photos often lack GPS information:
Unavailable. Most point-and-shoot cameras lack GPS hardware.
Stripped. Graphical applications and online services that resave pictures end up removing GPS information from pictures. This includes Twitter, Facebook, and most news outlets.
Disabled. Many users disable GPS support on their mobile devices because the GPS sensor dramatically shortens the battery life of the device. (If users have the options of talking 8 hours without GPS or 4 hours with GPS, then most users will disable GPS support.)
Excluded. Most mobile devices include an option to tag photos with GPS information. Due to privacy concerns, many users disable this option. (In some cases, devices default to excluding GPS information from pictures, and users rarely change the default settings.) Thus, even if the device supports GPS and has GPS enabled, the data may not be stored with the photo.
Between stripped metadata, devices that lack GPS data, and devices that do not store GPS information with pictures, few images contain embedded location information.
Limitation #2: Inaccurate GPS Metadata
When you first turn on your smartphone's map application, it typically shows a map with a large blue circle. The circle indicates your approximate location. This does not mean that you are located at the center of the circle; you can be physically located anywhere within the circle, including along the outer edge.
When the application first starts, the circle can be large -- often covering more than a mile (over 1 km). After a few minutes, the device should be able to narrow down the location, resulting in a circle that covers a smaller area. Given enough time, the circle should be able to shrink to a few feet (about a meter), which is enough to pinpoint the location.
The circle of accuracy is based on a measurement called the dilution of precision (DoP). While knowing the DoP is essential for identifying the precise location, few digital cameras record the DoP in the metadata. Without the DoP, there is no easy method for identifying the specific location, or even the range. In general, analysts should assume that no DoP value identifies the center of a circle that is about 2 miles (3 km) in diameter.
The public FotoForensics site receives pictures from all over the world. Yet, less than 1% of photos uploaded to the public site contain GPS data, and 1% of that (0.01%) contain DoP information.
Accuracy vs Precision
There is a siginificant distinction between accuracy and precision. Precision identifies the number of decimal points. Accuracy identifies the correctness. (3.159257421 is a very precise value, but it is an inaccurate representation of pi (π), which is 3.14...) GPS metadata is very precise -- typically recorded to 6 or 7 decimal places. However, without the DoP's range, the accuracy range is unknown.
For example, the following picture contains GPS information. (Click on it to open it in an analysis window.)
GPS without DoP
The metadata identfies the GPS location as 36° 7' 59.63" North, 115° 9' 22.84" West (also written as 36.1332305,-115.1563444). These coordinates pinpoint a location in the parking lot of the Las Vegas Convention Center.
Although these coordinates are very precise, they are also inaccurate. Visually, the photo shows a person (FotoForensics founder Neal Krawetz) sitting in a conference room, and not in a parking lot. The actual photo was taken in 2010 at the Riviera Hotel and Casino -- nearly a half-mile north-west of the GPS position. (The hotel closed in 2015 and is now a larger parking lot located north-west of the GPS position.)
Limitation #3: Alternate GPS Sources
GPS coordinates are typically determined using careful measurements from overhead satellites. Signals from a few satellites permits identifying a location with a large DoP. Signals from dozens of satellites permits narrowing the location down to a few meters. More satellite coverage yields higher accuracy, and the longer the device can listen for satellites, the better the location's accuracy becomes.
Although satellites provide the most common method for determining the GPS location, it is not the only method. Other methods for identifying GPS coordinates include:
Cellsite Triangulation. A cellphone can typically contact a couple of cellular towers. Measuring the signal strength from towers that have well-known locations permits approximating the cellphone's location. In general, this approach is more accurate than satellite coverage within densely populated areas, such as large cities. However, it is much less accurate in rural environments, where cell towers are sparse.
Wi-Fi Positioning. Mobile devices typically include wireless network support. At any given time, your device may hear beacons from a half dozen wireless access points (APs). Each AP has a name (SSID) and hardware address (MAC). There are many databases that include geolocation information for wireless APs. Comparing the list of detectable APs against a database of known AP locations permits triangulating the device's location to within a few hundred feet.
Network Address Geolocation. Network addresses assigned to mobile devices can be used to identify an approximate location. Although this can typically identify a country or state/province, it is much less likely to identify a small city or town. Although there are online services that identify precise latitude and longitude coordinates for network addresses, these services are typically inaccurate and should not be used as authoritative sources for identifying specific physical locations. (If the network geolocation identifies Paris, Texas, then the interpretation should be "in/near Paris, Texas" and not at the specific GPS coordinates.)
Although all of these location systems are recorded as "GPS" metadata, few mobile devices record the data source for the location information. The GPS data may be provided by satellite, Wi-Fi, cellsite, network, or some hybrid combination. (If it is provided, it will be in the EXIF metadata for "GPS Processing Method".)
Unfortunately, there are no devices that record the raw data that determined the location. For example: even if the GPS location was determined through Wi-Fi Positioning, the actual APs used to determine the position are not recorded in the metadata.
Limitation #4: Measurement Errors
Each of these location methods require direct access to a measurable source. However, buildings and mountains can restrict access. Skyscrapers can block the view of satellites, metal structures can limit cell tower coverage, and buildings can block wireless AP reception. (This is why car navigation systems become confused in dense cities with skyscrapers, parking garages, tunnels, and on deep mountain roads: the GPS unit cannot see the sky and cannot hear navigation satellites.)
When reception is limited in one direction, it can bias the reception and make the GPS data appear to be centered away from the actual location. For example, photos taken inside houses and office buildings typically have GPS data that reflects a location outside the nearest window. (If everything the device can measure is located in one direction, outside of a window, then the GPS coordinates will triangulate to an area outside the window.)
The Missing DoP Basis
The DoP is a computed ratio between the known output location change and the measured data change between multiple GPS sensor readings. This is why GPS does not provide instantaneous location information: it takes time for the changes between multiple readings to stablize.
The DoP ratio is typically a value between 1 and 20. Values less than 2 are very accurate, while values higher than 10 are considered weak or poor.
Converting from the DoP ratio to a physical range is inexact because it depends on the sensitivity of the device. For example, if the GPS sensor is accurate to 6 meters, then a DoP of "1" indicates a radius of 6 meters, "2" indicates 12 meters, etc.
Unfortunately, this computation leads to a problem: the sensitivity range for the sensor is never recorded in the metadata. (There is no GPS metadata field for storing this information!) As a result, we perform a best-guess: typical civilian devices are acurate to 5 meters (16 feet). To estimate the accuracy range, multiply the DoP value by 5 meters (and then convert meters to feet).
Fortunately, there are alternatives to the DoP. For example, beginning with iOS 9 (September 2015), Apple iPhone and iPad devices began to record the actual GPS accuracy range in meters. This value is record in a metadata field called the "GPS Horizontal Positioning Error".
Limitation #5: Developer Errors
In the best-case situation, the photo includes a precise GPS location and an accurate-enough DoP measurement. However, this still does not mean that the GPS information is extremely accurate. Due to measurement errors and limitations, the actual location may be off by meters. Similarly, while latitude and longitude may be very accurate, the altitude may still be incorrect. (Many mobile devices simply record "0" as the altitude.)
More often, developers record inaccurate GPS information. For example:
The specifications for GPS metadata state that location information is inaccurate when the DoP value is larger than "20" (a value of 20 indicates 100 meters or 328 feet), yet many cameras record GPS information that is inaccurate by miles.
Most devices leave out the GPS metadata when GPS information is unavailable. However, some devices include bogus values. For example, many Android devices record the GPS coordinates as 0° North by 0° West when GPS data is unavailable. Other devices leave the GPS timestamp fields empty or write the date as 0000-00-00 (year zero, month zero, day zero) to indicate an invalid data set.
Many mobile devices also record the timestamp when the measurement was taken. (See the "EXIF GPS Date Stamp" and "EXIF GPS Time Stamp" fields.) The GPS timestamp is typically captured seconds before the photo was taken. However, some devices incorrectly compute the timestamp. Here are just a few examples:
The Samsung GT-I9001 Galaxy S Plus converts from "years since 1900" incorrectly. Instead of adding 1900 to the year, this device adds 2000. The year 2012 would be recorded as 2112.
The Samsung GT-S5830M forgets to add in 1900, so the year 2013 is written as 0113.
The ZTE-C N880S stores binary data in the GPS date field, instead of the text date.
A few devices, like the Nokia Lumia 620, have been observed recording DoP values that are more accurate than reality. For example, the picture may show a person in a building, but the GPS coordinates -- with the DoP's approximate range -- pinpoint an outside location that is not adjacent to any buildings.
Limitation #6: Interpretation Errors
GPS information attempts to identify where the camera was located when the photo was taken. It does not identify where the subject in the photo is located. For example, zoom settings (both real zoom and digital zoom) can capture a distant object. GPS data will identify the camera's location and not the location of the distant object.
A few cameras do record the compass direction in the metadata. Such as a camera facing 12° North. However, even these calculations can be a little off, depending on whether the device is measuring from True North or Magnetic North, and how the compass sensor is mounted inside the mobile device. (My LG P509 was consistently off by 4° clockwise.)
Some cameras also record the distance from the camera to the subject. In general, this is accurate to within a few centimeters for near objects, but it can be off by many meters for distant objects.
Evaluating GPS Data
Given all of these limitation, GPS metadata is still valuable to evaluate. While it is unlikely to accurately identify a precise location where a photo was captured, it can identify a country, city, or general area. This, together with landmarks and satellite photo services like Google Maps, Bing Maps, and MapQuest, can help narrow down a location and direction.
Many mobile devices record the photo's creation date/time in the local time but the GPS timestamp is in GMT. Moreover, cellular devices typically set the time zone based on the nearest cell tower. Even if the time zone is not recorded in the metadata, the difference between the GPS timestamp and the creation timestamp can help identify the active time zone.
The three main things to remember about GPS data:
In general, pictures rarely contain GPS location information.
When GPS data exists, it identifies a position near the camera's location, but not necessarily the exact location. The GPS data may be off by a mile.
When a photo shows a room inside a building and the GPS position is adjacent to the outside of a building, the actual location is likely through the nearest window, inside the building. (In this case, 'adjacent' means "if you were standing there, then you could probably touch the building.")
Ideally, the GPS data should be used to identify an area near the photo (but not the precise location). Clues in the pictures, such as notable landmarks and buildings, should be compared against the search area in order to identify the picture's specific location.
Unique IDs
Often, investigators may encounter unique IDs within the metadata. These include camera serial numbers and unique picture IDs for tamper detection. Unfortunately, these values are typically unavailable and unverifiable.
Serial Numbers
Some cameras embed a unique serial number for the device within the metadata. When present, this allows an investigator to identify the specific device that generated the photo.
Unfortunately, serial numbers are uncommon and the locations are inconsistent.
Supported Cameras: Relatively few cameras embed serial numbers in the metadata. Most smartphones do not include serial numbers, but many DSLR cameras do. The web site Stolen Camera Finder has a list of cameras that include serial number information. The list includes:
This is not a comprehensive list of makes and models, and not every camera in these series includes the serial number in the metadata.
Serial Number Location: Most manufacturers store the serial number in the vendor-specific 'MakerNotes' metadata section. However, some cameras store the serial number in the EXIF data. Unfortunately, most applications and services remove metadata, so a picture that has been postprocessed may not have this information.
If the serial number exists, then it can be used to identify a potentially stolen camera. Alternately, if a picture is supported to come from a known camera and the serial number does not match, then fraud can be detected.
There have been some cases where the metadata serial number has been used to identify stolen cameras. And there are a few cases where the serial number identified a picture that was not from the specific camera (wrong serial number). However, this is the exception and not the norm. The norm is that most cameras do not store serial number information, and serial numbers are usually stripped out when pictures are post-processed.
Unique Photo IDs
Some cameras, including smartphones, include unique photo IDs in the EXIF or MakerNotes fields. The intention is to give each photo a unique identifier that can be used to identify tampering or misrepresentation. These unique IDs are often generated using a combination of camera settings, timestamp information, and random identifiers.
A common forgery technique copies the metadata from one picture into another picture. If the forger is not careful, then they may end up with multiple pictures that include the same 'unique' photo ID. Since two pictures should not have the same unique IDs, this permits detecting the forgery. Alternately, if the attributes used to generate the unique ID do not match the metadata, then tampering can be detected.
While useful, there are significant limitations to these encoded IDs:
Multiple Pictures: Duplication detection requires having at least two pictures. However, investigators may only have one photo.
Undocumented: Most camera vendors do not document the unique ID format. This deters the ability to compare the unique ID's camera settings and timestamp information with other metadata fields that record the camera settings and timestamp information.
Loosely random: Many vendors include 'unique' IDs that are not very unique. For example, the "Image Unique ID" generated by some Samsung Galaxy smartphones may only differ by a few characters, and different photos by different people could include the exact same "Image Unique ID". (Some Samsung Galaxy smartphones just record the word "IMAGE" and the timestamp in the 'Image Unique ID' field. It really varies by Samsung model.)
Cryptographic and proprietary: Some vendors, such as Canon, include a cryptographic photo signature. If you know the cryptographic algorithm, then you can compare the signature with the metadata and image in order to detect whether the picture is a camera-original. Unfortunately, the cryptographic algorithm is proprietary and Canon does not sell their decoding software to anyone except vetted law enforcement (and even then, it can be difficult and expensive to acquire).
The concept of the 'Image Unique ID' has potential for detecing altered images. However in practice, these IDs are usually unverifiable.
Warnings about Metadata
There are many different types of metadata and the purpose of the various metadata fields often causes confusion. For example, consider the common "date/time" field:
The EXIF creation or original date/time field identifies when the picture should have been created.
The EXIF modified date/time field identifies when the picture should have been last modified.
With some cameras, the modified timestamp may predate the creation time by a second. This happens when the modified time denotes when the first was first opened for writing, while the creation time identifies when the writing completes. This could be a tenth of a second, so 0.9 seconds becomes 1.0 seconds and the times appear different.
The GPS date and time identifies when the GPS data was read. This typically predates the image capture by a few seconds.
The IPTC date and time approximates when the photo was taken. It may have been entered manually or by an application that used the EXIF time stamp (whether creation, original, or last modified varies by the application).
The XMP date and time could be when the photo was created, when the XMP record was added, or when an alteration occurred.
And the ICC Profile date time indicates when the color profile was created, and not when it was attached to the file. The ICC Profile's timestamp is independent of the file's creation date.
Although these are all date/time fields, they serve different purposes that could be misinterpreted. (It's very common for people to incorrectly interpret the ICC Profile time stamp with the picture's creation time.)
Accuracy and Reliability
Most metadata structures are populated by automated systems. We typically accept it at face value. Unfortunately, this does not prevent applications and malicious users from changing the metadata values. For this reason, we must evaluate the metadata and confirm that it matches the visual content. For example, if the metadata says the image should be 1024x768 and rotated 90°, then it should be compared against the actual image dimensions and visual content.
If the metadata matches the content, then there is nothing suspicious.
If there is a mismatch, then we know something was changed. It could be an alteration to the metadata, a change to the visual content, or both.
Altering Metadata
Most metadata formats lack any kind of internal consistency checks.
Textual descriptions, like those found in EXIF, IPTC, and XMP comments, are manually added. A person may enter incorrect information for the description, artist, or location.
Malicious or intentional changes to the metadata can often be done with simple binary editors, command-line tools, or graphical applications. While many common tools leave artifacts that denote alterations, it is possible to make undetectable changes.
A few metadata formats include checksums for tamper detection. For example, ICC Profiles include a checksum to make sure the profile is intact. (Either it's the MD5 value for the ICC Profile, or all zeros to denote no checksum.) Similarly, XMP may include a checksum that covers any IPTC records. While changing the checksums requires a little technical knowledge, these can always be recomputed after any changes are made.
For formats that support internal checksums:
An invalid checksum identifies that something changed.
A valid checksum does not mean the values are unaltered.
While "unaltered with valid checksums" is the typical case, it is not a guarantee. For this reason, all metadata should be cross-checked for consistency with other metadata and the visual content.
Misleading Metadata: C2PA
The Coalition for Content Provenance and Authenticity (C2PA) has attempted to define a way to track a picture's origin (provenance) and authenticate the metadata. According to their documentation, they claim their system returns validated and authenticated results with tamper detection. However, this is far from the truth.
C2PA uses a strong cryptographic signature for tamper detection. Unfortunately, this approach is fundamentally flawed because it is based on the 'honor system':
Untrusted metadata: C2PA records often replicate metadata that is typically found in EXIF, XMP, IPTC, or other metadata fields. While the standard metadata fields (EXIF, XMP, etc.) could be altered, these fields are often populated by automated systems. In contrast, the metadata found in C2PA's JUMBF metadata was explicitly copied or populated as an additional step. For example, EXIF GPS data likely came from a GPS sensor, but the "JUMBF Exif GPS" data was explicitly copied from somewhere else. This additional processing step means the data is easier to alter and should not be explicitly trusted.
No Validation: C2PA uses a cryptographic signature. However, it does not validate the metadata prior to signing. A valid signature only means that the data was not altered after being cryptographically signed. Putting strong cryptography over unverified data does not make the data more trustworthy.
No details: The C2PA signature can detect if the metadata (EXIF, IPTC, XMP, etc.) or visual content has been altered after it was signed. However, it does not identify what was altered.
Untrusted components: C2PA permits including other images as components for a composite image. However, specifying that an inclusion occurred is optional, the included component may lack C2PA information, and even if it has C2PA information, it is not always verified prior to any subsequent digital signatures.
Unverified components: A C2PA-signed file can be used as a prerequisite or component of a newer C2PA-signed file. However, the inner signature may not be verifiable by the outer signature. As a result, dependent signatures cannot always be trusted. Some signing tools, like Microsoft Designer, can be used to sign anything and invalid internal signatures will appear to be trusted.
Unauthenticated ownership: C2PA does not authenticate that the cryptographic signature belongs to the person or company that is doing the signing. If you know how to use 'openssl', then it is trivial to create a forged signature for false attribution.
Undetected replaced signatures: C2PA cannot detect when a cryptographic signature is removed or replaced with a different signature. This trivially defeats C2PA's tamper detection system. In effect, anyone can sign anything with a signature that looks like anyone else.
Untrusted timestamps: The timestamp provided by the trusted timestamp authority is not validated; it can easily be altered without detection. In effect, you must trust that nobody altered the timestamp.
Untrusted certificates: C2PA uses a chain of certificates. Unfortunately, the chain is often not linked to a known trusted certificate authority. As a result, the certificates are untrusted even if they cryptographically decode the digital signatures. If the industry standard for trusted certificate authorities (the CCADB) does not trust the C2PA validation chain, then why should you?
Vendor-specific: Rather than using the industry-standard trusted root certificates, C2PA's implementation requires using their own certificate chain that is managed and distributed by CAI/Adobe. This makes their solution vendor-specific and proprietary. Moreover, they permit vendors to register root and intermediary certificates, which allows these Adobe-approved vendors to issue certificates for anyone and enables undetectable impersonations.
Deviated standards: C2PA uses "X.509" certificates, but does not adhere to the X.509 standard. In particular, C2PA does not support revocation and explicitly ignores expiration dates.
From a computer security prospective, C2PA lacks authenticity, authorization, integrity, and non-repudiation. (It fails four of the five key security principles. The fifth principle, confidentiality, does not apply since the purpose of metadata is to identify sources.)
C2PA Adoption
C2PA is sponsored by very large tech companies, including Adobe, Microsoft, Intel, Sony, and allegedly supported by hundreds of other companies. This gives a false sense of trustworthiness. (Proof by Peer Pressure isn't a reliable validation method.)
Currently C2PA is included in products by Adobe, Microsoft, Truepic, Qualcomm, Leica, and others. These companies promote C2PA as a system that provides authenticated provenance information and that includes tamper detection. Unfortunately, this is misleading. With a little technical knowledge, the data can be altered without detection. C2PA permits false information and false attribution to appear with a valid cryptographic signature. The backing of large tech companies and false product claims gives the impression that the data is authoritative and verifiable, when it is neither.
On the public FotoForensics service, we are already seeing known-fraud groups experiment with forged C2PA metadata to give their forgeries the appearance of validity and authenticity. (If more companies deployed C2PA, then we're certain that the fraud groups would be exploiting this system.)
If you see a file with C2PA metadata, be sure to use other tools to validate any metadata values and to cross-validate the information with the visual content. Do not assume that the metadata is valid, even if there is a valid cryptographic signature.
The file may be inadvertently altered due to how it was handled. Simply importing an image into the Windows Photo Gallery or including a picture as an email attachment can alter the metadata and result in an invalid C2PA signature. Even if the C2PA signature is invalid, the metadata may be correct. Do not assume that an invalid signature denotes an intentional alteration.
Regardless of whether the C2PA signatures are valid or invalid, you cannot rely on C2PA to authenticate the file's contents.
Additional Information
For more information about the problems with C2PA, see the Hacker Factor blog:
There are a variety of solutions for labeling content when it is generated by AI, including C2PA. None of these methods are consistent or reliable. Moreover, the focus on AI, rather than on the human contribution, directly impacts copyright and patent protections.
The C2PA ecosystem contains many different workgroups, including C2PA, CAI, CAWG, and Project Origin. However, there is an incredible amount of overlap between these organizations and Adobe. CAI and CAWG are Adobe-driven efforts, C2PA is almost entirely driven by Adobe, and Project Origin is effectively dependent on C2PA/CAI/Adobe. While C2PA's solution claims to be provided by an industry coalition, it appears to be an Adobe-centric effort. Rather than being focused on authentication, it appear to be focused on tracking.
CBC uses a three-legged approach. The first left, provenance, is unreliable. The second leg is watermarking. This blog shows how to identify, alter, and forge watermarks -- assuming you can figure out who provides the watermark. This means that C2PA's watermarking option is just as unreliable as their provenance solution.
CBC/Radio Canada released a C2PA example that uses conflicting signatures and contradicting metadata. This example provides unreliable provenance and questionable authentication.
C2PA does not use the commercially-accepted "root trust store" that identifies vetted and trusted root certificates. Instead, C2PA/CAI uses their own untrusted list of root certificates that is vetted and managed by CAI/Adobe, making C2PA a vendor-specific and proprietary solution. This implementation also permits forged signature to impersonate anyone and appear authentic.
LinkedIn deployed a C2PA solution. Unfortunately, it (1) provides misinformation and misleading information, (2) omits known attribution, (3) fails to report provably inaccurate information (even though they clearly detect it and then choose not to report it), and (4) prevents independent verification of their results.
The cryptographicaly signed and authenticate timestamps used by C2PA are neither authenticated nor signed; they are trivial to alter without detection. In addition, the entire three-prong approach used by C2PA is, to quote a report by IEEE Spectrum, "flimsy at best".
The BBC released their first verification of content using C2PA. Unfortunately, they validated, authenticated and attributed provenance to an known forgery. Specifically, the audio track predates the authenticated video by at least 3 months, and the video has numerous splices that suggest severe editing along with content combined from different videos.
Frequency asked questions about C2PA's limitations and a detailed example of the worst-case scenario: using a forgery to frame someone for a serious crime.
A deep dive into multiple C2PA limitations. These are fundamental problems and are independent of the implementation; all implementations will have these problems. The blog details four different ways to defeat the cryptographic authentication and enable forgeries. Each of the examples uses pictures seen in the real world. The authentic-looking forgeries are created using open source tools provided by Adobe.
The C2PA solution by Adobe includes a sample picture that fails to identify the authenticated source. This shows that a valid signature does not provide provenance or authenticity, even though it's in the name: C2PA.
The C2PA solution implemented by the New York times includes provably false and inaccurate information. Again, the valid signature does not provide provenance or authenticity.
The C2PA solution by Starling Lab included a sample picture that has a valid cryptographic signature but provably altered metadata. It demonstrates that the metadata is not validated or authenticated by this authentication solution.