ApoorvCTF 2026 - Project Mirrorfall - AI Writeup

Category: AI Flag: apoorvctf{7d88323_0.0245}

Challenge Description#

Your first task is to locate a public archive serving as an archival mirror for the 2013 intelligence disclosures.

Within this archive, locate the raw PDF classification guide dated on September 5, 2013 that corresponds to the overarching US encryption defeat program.

Variable X: extract the first 7 characters of the latest commit SHA for this exact PDF file. Do not use the repository’s main commit hash.

Download the raw PDF classification guide. Navigate through the dense administrative caveats to Appendix A, which lists the program’s specific capabilities.

Locate the “Remarks” column corresponding to the list of Exceptionally Controlled Information (ECI) compartments used to protect these details.
The first ECI listed is APERIODIC. Identify the second ECI compartment listed immediately after it (an 8-letter codeword).
Normalize this codeword.

Process the extracted ECI codeword through a specific semantic embedding model.

Initialize all-MiniLM-L6-v2 model.

Pass the normalized, 8-letter ECI codeword into the model to generate its tensor embedding array (model.encode()).
Variable Y: Extract the first floating-point value from the resulting embedding array (Index 0) and round it to 4 decimal places.

Analysis#

This one looked like an OSINT-plus-reproducibility trap at first, because the wording pushes you toward broad archive hunting, but the solve path becomes clean once you lock onto a concrete mirror and stay deterministic. I used a small script to query GitHub metadata for a known Snowden mirror candidate, pull the exact target PDF from raw, and parse the APERIODIC line directly from extracted text so there was no manual ambiguity.

tableflip

1
import re
2
import requests
3
from pathlib import Path
4
from pypdf import PdfReader
5

6
owner, repo = "iamcryptoki", "snowden-archive"
7
target = "documents/2013/20130905-theguardian__bullrun.pdf"
8
api = "https://api.github.com"
9

10
meta = requests.get(f"{api}/repos/{owner}/{repo}", timeout=30).json()
11
branch = meta["default_branch"]
12

13
commits = requests.get(
14
    f"{api}/repos/{owner}/{repo}/commits",
15
    params={"path": target, "per_page": 1, "sha": branch},
16
    timeout=30,
17
).json()
18
sha = commits[0]["sha"]
19
sha7 = sha[:7]
20

21
raw_url = f"https://raw.githubusercontent.com/{owner}/{repo}/{branch}/{target}"
22
pdf_path = Path("mirrorfall_bullrun.pdf")
23
pdf_path.write_bytes(requests.get(raw_url, timeout=60).content)
24

25
text = "\n".join((p.extract_text() or "") for p in PdfReader(str(pdf_path)).pages)
26
text = text.replace("\u2019", "'")
27

28
line = next((ln for ln in text.splitlines() if "APERIODIC" in ln), "")
29
parts = [t.strip(" ,.;:/()\t\r\n").upper() for t in re.split(r"\s+", line) if t.strip()]
30
idx = parts.index("APERIODIC")
31
second_eci = parts[idx + 1]
32

33
print(f"repo={owner}/{repo}")
34
print(f"file={target}")
35
print(f"latest_file_sha={sha}")
36
print(f"variable_x={sha7}")
37
print(f"aperiodic_line={line}")
38
print(f"second_eci={second_eci}")
39
print(f"normalized={second_eci.lower()}")

1
python mirrorfall_extract.py

1
repo=iamcryptoki/snowden-archive
2
file=documents/2013/20130905-theguardian__bullrun.pdf
3
latest_file_sha=7d88323521194ed8598624dc3a932930debdde1d
4
variable_x=7d88323
5
aperiodic_line=APERIODIC,  AMBULANT,
6
second_eci=AMBULANT
7
normalized=ambulant

That gives X = 7d88323 and the normalized 8-letter ECI codeword ambulant. The final layer was deterministic embedding extraction with all-MiniLM-L6-v2, taking index 0 and rounding to 4 decimals exactly as requested.

smile

1
import os
2
import random
3
import numpy as np
4
from sentence_transformers import SentenceTransformer
5

6
os.environ.setdefault("TOKENIZERS_PARALLELISM", "false")
7
random.seed(0)
8
np.random.seed(0)
9

10
codeword = "ambulant"
11
model = SentenceTransformer("all-MiniLM-L6-v2")
12
embedding = model.encode(codeword, convert_to_numpy=True, show_progress_bar=False)
13
y = float(embedding[0])
14

15
print(f"codeword={codeword}")
16
print(f"embedding_dim={len(embedding)}")
17
print(f"y_raw={y}")
18
print(f"y_rounded={y:.4f}")

1
python mirrorfall_y.py

1
codeword=ambulant
2
embedding_dim=384
3
y_raw=0.02446681074798107
4
y_rounded=0.0245

Combining both variables as <X>_<Y> gives 7d88323_0.0245, which matches the required flag prefix format.

Solution#

1
import re
2
import requests
3
from pathlib import Path
4
from pypdf import PdfReader
5

6
owner, repo = "iamcryptoki", "snowden-archive"
7
target = "documents/2013/20130905-theguardian__bullrun.pdf"
8
api = "https://api.github.com"
9

10
meta = requests.get(f"{api}/repos/{owner}/{repo}", timeout=30).json()
11
branch = meta["default_branch"]
12

13
commits = requests.get(
14
    f"{api}/repos/{owner}/{repo}/commits",
15
    params={"path": target, "per_page": 1, "sha": branch},
16
    timeout=30,
17
).json()
18
sha = commits[0]["sha"]
19
sha7 = sha[:7]
20

21
raw_url = f"https://raw.githubusercontent.com/{owner}/{repo}/{branch}/{target}"
22
pdf_path = Path("mirrorfall_bullrun.pdf")
23
pdf_path.write_bytes(requests.get(raw_url, timeout=60).content)
24

25
text = "\n".join((p.extract_text() or "") for p in PdfReader(str(pdf_path)).pages)
26
line = next((ln for ln in text.splitlines() if "APERIODIC" in ln), "")
27
parts = [t.strip(" ,.;:/()\t\r\n").upper() for t in re.split(r"\s+", line) if t.strip()]
28
second_eci = parts[parts.index("APERIODIC") + 1].lower()
29

30
print(sha7)
31
print(second_eci)

1
import os
2
import random
3
import numpy as np
4
from sentence_transformers import SentenceTransformer
5

6
os.environ.setdefault("TOKENIZERS_PARALLELISM", "false")
7
random.seed(0)
8
np.random.seed(0)
9

10
model = SentenceTransformer("all-MiniLM-L6-v2")
11
embedding = model.encode("ambulant", convert_to_numpy=True, show_progress_bar=False)
12
print(f"{float(embedding[0]):.4f}")

1
python mirrorfall_extract.py
2
python mirrorfall_y.py

1
7d88323
2
ambulant
3
0.0245
4
apoorvctf{7d88323_0.0245}