Category: AI
Flag: apoorvctf{7d88323_0.0245}
Challenge Description
Your first task is to locate a public archive serving as an archival mirror for the 2013 intelligence disclosures.
Within this archive, locate the raw PDF classification guide dated on September 5, 2013 that corresponds to the overarching US encryption defeat program.
- Variable X: extract the first 7 characters of the latest commit SHA for this exact PDF file. Do not use the repository’s main commit hash.
Download the raw PDF classification guide. Navigate through the dense administrative caveats to Appendix A, which lists the program’s specific capabilities.
-
Locate the “Remarks” column corresponding to the list of Exceptionally Controlled Information (ECI) compartments used to protect these details.
-
The first ECI listed is APERIODIC. Identify the second ECI compartment listed immediately after it (an 8-letter codeword).
-
Normalize this codeword.
Process the extracted ECI codeword through a specific semantic embedding model.
Initialize all-MiniLM-L6-v2 model.
-
Pass the normalized, 8-letter ECI codeword into the model to generate its tensor embedding array (model.encode()).
-
Variable Y: Extract the first floating-point value from the resulting embedding array (Index 0) and round it to 4 decimal places.
Analysis
This one looked like an OSINT-plus-reproducibility trap at first, because the wording pushes you toward broad archive hunting, but the solve path becomes clean once you lock onto a concrete mirror and stay deterministic. I used a small script to query GitHub metadata for a known Snowden mirror candidate, pull the exact target PDF from raw, and parse the APERIODIC line directly from extracted text so there was no manual ambiguity.

import reimport requestsfrom pathlib import Pathfrom pypdf import PdfReader
owner, repo = "iamcryptoki", "snowden-archive"target = "documents/2013/20130905-theguardian__bullrun.pdf"api = "https://api.github.com"
meta = requests.get(f"{api}/repos/{owner}/{repo}", timeout=30).json()branch = meta["default_branch"]
commits = requests.get( f"{api}/repos/{owner}/{repo}/commits", params={"path": target, "per_page": 1, "sha": branch}, timeout=30,).json()sha = commits[0]["sha"]sha7 = sha[:7]
raw_url = f"https://raw.githubusercontent.com/{owner}/{repo}/{branch}/{target}"pdf_path = Path("mirrorfall_bullrun.pdf")pdf_path.write_bytes(requests.get(raw_url, timeout=60).content)
text = "\n".join((p.extract_text() or "") for p in PdfReader(str(pdf_path)).pages)text = text.replace("\u2019", "'")
line = next((ln for ln in text.splitlines() if "APERIODIC" in ln), "")parts = [t.strip(" ,.;:/()\t\r\n").upper() for t in re.split(r"\s+", line) if t.strip()]idx = parts.index("APERIODIC")second_eci = parts[idx + 1]
print(f"repo={owner}/{repo}")print(f"file={target}")print(f"latest_file_sha={sha}")print(f"variable_x={sha7}")print(f"aperiodic_line={line}")print(f"second_eci={second_eci}")print(f"normalized={second_eci.lower()}")python mirrorfall_extract.pyrepo=iamcryptoki/snowden-archivefile=documents/2013/20130905-theguardian__bullrun.pdflatest_file_sha=7d88323521194ed8598624dc3a932930debdde1dvariable_x=7d88323aperiodic_line=APERIODIC, AMBULANT,second_eci=AMBULANTnormalized=ambulantThat gives X = 7d88323 and the normalized 8-letter ECI codeword ambulant. The final layer was deterministic embedding extraction with all-MiniLM-L6-v2, taking index 0 and rounding to 4 decimals exactly as requested.

import osimport randomimport numpy as npfrom sentence_transformers import SentenceTransformer
os.environ.setdefault("TOKENIZERS_PARALLELISM", "false")random.seed(0)np.random.seed(0)
codeword = "ambulant"model = SentenceTransformer("all-MiniLM-L6-v2")embedding = model.encode(codeword, convert_to_numpy=True, show_progress_bar=False)y = float(embedding[0])
print(f"codeword={codeword}")print(f"embedding_dim={len(embedding)}")print(f"y_raw={y}")print(f"y_rounded={y:.4f}")python mirrorfall_y.pycodeword=ambulantembedding_dim=384y_raw=0.02446681074798107y_rounded=0.0245Combining both variables as <X>_<Y> gives 7d88323_0.0245, which matches the required flag prefix format.
Solution
import reimport requestsfrom pathlib import Pathfrom pypdf import PdfReader
owner, repo = "iamcryptoki", "snowden-archive"target = "documents/2013/20130905-theguardian__bullrun.pdf"api = "https://api.github.com"
meta = requests.get(f"{api}/repos/{owner}/{repo}", timeout=30).json()branch = meta["default_branch"]
commits = requests.get( f"{api}/repos/{owner}/{repo}/commits", params={"path": target, "per_page": 1, "sha": branch}, timeout=30,).json()sha = commits[0]["sha"]sha7 = sha[:7]
raw_url = f"https://raw.githubusercontent.com/{owner}/{repo}/{branch}/{target}"pdf_path = Path("mirrorfall_bullrun.pdf")pdf_path.write_bytes(requests.get(raw_url, timeout=60).content)
text = "\n".join((p.extract_text() or "") for p in PdfReader(str(pdf_path)).pages)line = next((ln for ln in text.splitlines() if "APERIODIC" in ln), "")parts = [t.strip(" ,.;:/()\t\r\n").upper() for t in re.split(r"\s+", line) if t.strip()]second_eci = parts[parts.index("APERIODIC") + 1].lower()
print(sha7)print(second_eci)import osimport randomimport numpy as npfrom sentence_transformers import SentenceTransformer
os.environ.setdefault("TOKENIZERS_PARALLELISM", "false")random.seed(0)np.random.seed(0)
model = SentenceTransformer("all-MiniLM-L6-v2")embedding = model.encode("ambulant", convert_to_numpy=True, show_progress_bar=False)print(f"{float(embedding[0]):.4f}")python mirrorfall_extract.pypython mirrorfall_y.py7d88323ambulant0.0245apoorvctf{7d88323_0.0245}