Understanding the basics of internet investigations can take you very far in helping you answer questions and solve problems. You can also use this understanding to strategize what information you share online.
The Internet is a vast, open data collection that can be an extremely valuable tool for analysts, investigators, and researchers. Let’s start by looking at how the Internet is divided up.
Table of Contents
Social media is an invention of the modern web, or as older folks like to call it, Web 2.0.
Web 2.0 is a collaborative place where online activity has evolved in communication and sharing. It is so prolific and revealing that investigators love looking at public social profiles for content.
There are all kinds of information to glean from social media, such as:
- Known associates, friends, and family.
- Likes, causes, hobbies, and passions.
- History of events.
- Photo and video content.
- Reverse image searches.
- Links to other public web content. Such as:
- Other social accounts.
- Forums and chat profiles (IRC, for example).
If you run a Google search of your personal information, you may find yourself on a people database site. Although some databases have legitimate uses, people databases can be an essential tool for internet sleuths.
These databases gather public information about individuals from various sources, including social media, public records, and court documents. The information collected can include the following:
- Email address
- Physical address and last known addresses
- Phone numbers
- Family information
- Employment history
- Criminal records
- and even maps and photographs
Most people don’t realize how much of their information is collected without their consent. As internet sleuths, many public people do not hold their own personal investigations to an ethical standard, respecting individuals’ privacy and not using the information obtained in a harmful or malicious manner.
Ultimately, people databases can be a valuable resource for those involved in online investigations for legitimate purposes like employment screening, but it’s essential to use them responsibly and with consideration for individuals’ privacy. Unfortunately, you don’t have a way to enforce that without removing yourself from these databases.
The Deep Web
Over a billion websites and even more indexed pages are available through search engines. Would it surprise you that this is only a fraction of the websites and pages out there?
That amount of websites and pages may make up 12% of the searchable web.
Most people confuse the deep web with the dark web. The deep web is a huge number of websites within public reach that a search engine hasn’t indexed.
Update 11/1/17: there’s a great writeup by Daniel Miessler on the differences in this topic.
If you created a new website, you would fall into this category. Websites that are new or websites that have been around but haven’t been indexed can be reachable by direct links or using specific deep web search tools, databases, and techniques.
The Dark Web
Users on the dark web specifically and intentionally hide their websites from view.
Criminals indeed operate on the dark web, as well as a publication of all manner of objectionable content. But before you raise your pitchforks and chant to abolish the dark web so we’re all safer, know something important. Your company HR portal, where you must be in the office to download your paystub, is considered dark web.
It makes sense, though, right? Why would you want your Intranet publicly exposed?
There are a few techniques for hiding your website in the dark web. For example, something as simple as fooling indexing robots, like the ones used at Google, or telling them not to index certain pages. Other methods involve hiding behind Tor.
There are limitations to what can be searched and how big of a footprint you make, but it’s still possible to find dark websites.
The Archived Web
Do you remember when people told you stuff on the Internet couldn’t be deleted and was online forever? This is partially why.
Even though web content can be edited or deleted, other websites and services, such as The Internet Archive, allow you to see content as it was in specific time periods. It’s like a nerd’s virtual time machine.
Even though looking at a website’s history helps investigate website compromise, it’s actually kind of fun to look up old websites.
For a good laugh, we may publish an article looking at earlier versions of the most popular web services.
Archive Web Providers
This section was updated on 5/2/19.
- Google Cache – use
cache:before the full URL or append the full URL to
- WayBack Machine – this refers to the Internet Archive (web.archive.org)
- Archive.today – this has many other top-level domains (like archive.is).
- WebCite – you can also append the full URL to
Safe Site Links
If a website has no history or doesn’t appear on the archived web, you may need to check reputation sites to see who else has been dinged.
Ideally, you want something to go on, but having a clean scan doesn’t entirely mean everything is on the up and up.
|Scanner||Clean Scan Message|
|Google Safe Browsing||Not currently listed as suspicious|
|McAfee SiteAdvisor||Didn’t find any problems|
|Norton Safe Web||Found no issues with this site|
|Sucuri SiteCheck||Verified clean / Not blacklisted|
|AVG ThreatLabs||No active threats were reported|
Update: AVG ThreatLabs has been discontinued. We linked to their web safety guide instead.
Internet Investigations Conclusion
As more people create online profiles and post publicly available information, internet investigations become simpler; any armchair detective can now dig up all sorts of details about a person if they have the proper internet search skills.
All it takes is a bit of internet sleuthing to find out names, addresses, and even telephone numbers to start piecing together the lives of those passing through our interconnected world.
These investigations have never been easier or more accessible with all of the resources and public records aiding in putting the puzzle pieces together today. The key is to limit the amount of information we share as much as possible while still being included in the connected world.