OSINT foundations and methodology

What open-source intelligence is, the intelligence cycle, the passive/active distinction, legal constraints, and investigator OPSEC.

What OSINT is

Open-Source Intelligence (OSINT) is the collection and analysis of information from publicly available sources: no hacking, no social engineering, no insider access. The term predates the internet; it began with newspapers, radio broadcasts and published government documents. In a security context it now covers anything publicly accessible: websites, social media, domain registration records, satellite imagery, breach databases, job listings, court records, and much more.

The reason OSINT matters to both attackers and defenders is the volume of information people and organisations unintentionally expose. A job posting reveals internal technology stacks. A LinkedIn profile exposes org-chart detail and email formats. A developer's public GitHub repo contains a hardcoded API key committed three years ago. An employee's Instagram post tags the company building, revealing its physical layout. None of this required a single unauthorised action to collect.

OSINT is primarily a passive reconnaissance technique: you gather information without directly touching the target's infrastructure. Because you're reading public data rather than sending packets to the target, it's quiet, generally undetectable, and (when done against genuinely public data) lawful. This is what separates it from active reconnaissance, where you send probes directly to the target (scanning ports, probing services), which is noisy, detectable, and requires explicit authorisation.

The intelligence cycle

OSINT doesn't mean "search for everything and hope." Effective intelligence collection follows a cycle:

Planning and direction: define what question you're trying to answer. "What is this company's internet-facing attack surface?" is a useful question. "Find everything about the company" is not. A clear requirement focuses collection and avoids drowning in noise.
Collection: gather raw data from sources relevant to the question. At this stage you're not judging quality, just capturing.
Processing: convert raw data into a usable form: translating, decoding, structuring, deduplicating.
Analysis: apply judgement. Look for patterns, corroborate findings across independent sources, weigh reliability. A single source is a lead; two independent sources is confidence.
Dissemination: present findings in a form the audience can act on: a report, a link graph, a timeline.
Feedback: gaps in the output loop back into new collection requirements. Intelligence is iterative.

In practice, OSINT for a penetration test or CTF often compresses this: plan (scope the target), collect breadth-first (all domains, all employees, all exposed services), process into a structured profile, and analyse to identify the most promising attack paths.

Legal and ethical constraints

The distinction between "publicly available" and "lawful to use" is subtler than it looks.

In the UK, gathering and analysing public data is generally lawful, but constraints apply:

Computer Misuse Act 1990: OSINT doesn't involve unauthorised access and so typically doesn't engage the CMA. However, if an OSINT step accesses a system without authorisation (even via an API used outside its terms), that crosses into CMA territory.
UK GDPR: personal data you collect is subject to data protection law, even if it was public. Processing individuals' personal information requires a lawful basis. On an authorised engagement this is usually legitimate interest or contractual necessity; as an individual you still have obligations around retention and purpose.
Scope: on an authorised engagement, the scope defines what targets you may investigate. Investigating individuals outside scope (even from public sources) can breach the agreement and, for personal data, GDPR.
Platform terms of service: OSINT tools such as Maltego, Shodan and HIBP each carry ToS. Automated scraping at scale, or using findings to harass or defraud, moves clearly into harmful territory regardless of whether the data was technically public.

Public availability reduces the harm of collection but does not eliminate your legal and ethical duties as an investigator.

Investigator OPSEC

A careless OSINT investigator tips off the target. Visiting a company's website from your corporate IP, searching a name while signed into Google, or downloading a document that phones home with telemetry all reveal that someone is looking. For red team, journalism and some research applications, keeping your investigation covert matters.

Sock puppet accounts: create separate research personas on social media platforms to avoid linking your investigation to your real account. Use a fresh email address, a VPN or Tor for registration, avoid reusing passwords, and don't connect the persona to any real-identity services.

Isolated research environment: run OSINT work from a dedicated virtual machine with no association to your normal identity. Take snapshots so you can wipe and restore. Don't log in to personal accounts from this VM.

Network separation: route investigative traffic through a VPN (ideally in a neutral jurisdiction), through the Tor network for highest anonymity, or through a dedicated investigation service. Note that Tor exit-node IPs are publicly listed, so sophisticated targets may spot Tor-sourced traffic.

Passive sources first: prefer intermediary services that query sources on your behalf (Shodan, SecurityTrails, cached pages) over visiting the target directly. This inserts a layer between your IP and the target.

Document everything: note sources, timestamps and exact search queries alongside findings. In a report or legal proceeding, provenance matters. Screenshots, not just notes.

Minimising your own OSINT footprint

Defenders need this chapter too. The same techniques attackers use to profile a target can be applied to your own organisation. An OSINT audit of your own footprint covers:

Domain registration data and reverse WHOIS for all associated domains
All internet-facing systems visible on Shodan and Censys
Subdomains discovered via certificate transparency
Employee data on LinkedIn and company pages
Credentials appearing in breach databases
Secrets in public code repositories
Documents published with retained metadata

The goal is to identify exposure before an attacker does, then reduce it: remove unnecessarily public information, enable registrar privacy, remove stale subdomains, and rotate any discovered credentials.

Quick recall

OSINT = publicly available sources, no unauthorised access. Passive and generally lawful; UK GDPR still applies to personal data even if public.
Intelligence cycle: Plan → Collect → Process → Analyse → Disseminate → Feedback. Never collect without a clear requirement.
Investigator OPSEC: sock puppet identities, isolated VM, VPN or Tor, passive services before direct visits, document sources and timestamps.
Defensive use: run the same techniques against yourself. Exposed credentials, forgotten subdomains, metadata in docs and shadow IT are your own critical findings.