All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
-
SwiftEmailValidatorIDNAcompanion target — opt-in UTS #46 Unicode IDNA Compatibility Processing on the host portion of the address. Mirrors theSwiftEmailValidatorUTS39architecture: imported separately (import SwiftEmailValidatorIDNA) so the ~385 KB UCD-derived data doesn't bundle into callers that don't need it. Provides:IDNA.toAscii(_:options:)/IDNA.toUnicode(_:options:)— direct UTS #46 §4 Map / NFC / Break / Validate / ToASCII pipeline.IDNA.domainValidator(_:base:)factory and convenience overloadsEmailSyntaxValidator.correctlyFormatted(_:idna:)/mailbox(from:idna:)chaining IDNA processing into the existingdomainValidatorslot.IDNA.Options—transitional(defaultfalse, matches the post-2016 spec recommendation and modern browsers),checkHyphens,useSTD3ASCIIRules,verifyDnsLength,checkBidi,checkJoiners,checkContextO.- Self-contained RFC 3492 Punycode encoder/decoder with overflow guards on every multiply/add.
- Bundled IDNA mapping table, Bidi_Class, Joining_Type, Virama, and Script tables — Unicode 17.0.0.
- Generator pipeline at
Sources/SwiftEmailValidatorIDNA/Tools/(fetch-ucd.sh+generate.py); regenerate only on UCD upgrades.
-
UTS #46 §4 V1-V7 validity criteria all enforced by default: V1 NFC, V2 hyphen rules (leading/trailing + 3-4 with
xn--carve-out), V3 leading combining mark rejection (Mn/Mc/Me), V4/V5 per-scalar status +UseSTD3ASCIIRulesLDH gate, V6CheckBidi(RFC 5893 §2), V7CheckJoiners(RFC 5892 §A.1 ZWNJ + §A.2 ZWJ). -
UseSTD3ASCIIRulesenforcement at the validator layer. The modern preprocessed IDNA Mapping Table classifies non-LDH ASCII (_,/,:,@,*,<,>,",$,+, C0/DEL, …) asvalidwith the informationalNV8tag, so STD3 cannot be inferred from the table status alone. The validator runs an explicit per-scalar LDH check on ASCII scalars after Step 1 mapping (soU+FF0F→U+002Fis also caught) and afterxn--decoding. -
VerifyDnsLengthenforcement per UTS #46 §4.2 ToASCII step 5 / RFC 5890 §2.3.1: each A-label 1-63 octets, total domain 1-253 octets excluding any trailing root dot. Applied uniformly to every A-label, not just freshly Punycode-encoded ones, so a 64-char input ASCII label and an oversizedxn--input are both caught. Empty trailing root labels rejected when on; preserved when off. -
RFC 5893 §2 Bidi rule (UTS #46 V6). Enforced on every label of any "Bidi domain name" (RFC 5893 §1.4: any label contains R/AL/AN). Pure-LTR siblings of RTL labels are also gated. All six conditions implemented (LTR-start vs RTL-start, condition 4 EN/AN exclusivity, trailing-NSM peeling for conditions 3/6).
-
RFC 5892 §A.1 / §A.2 CONTEXTJ rules (UTS #46 V7). ZWNJ allowed when preceded by Virama (CCC=9) OR when sandwiched between L|D and R|D Joining_Type runs (T-class scalars transparent on both sides). ZWJ allowed only when preceded by Virama. Catches ZWJ/ZWNJ used outside legitimate joining contexts — a known homograph attack vector — while preserving Persian/Indic legitimate use.
-
RFC 5892 §A.3-§A.9 CONTEXTO rules layered on top of UTS #46 §4 as a security extension (default on, opt-out via
IDNA.Options.checkContextO: false):- §A.3
U+00B7MIDDLE DOT only between two'l'(Catalan ela geminada). - §A.4
U+0375GREEK KERAIA only when followed by Greek script. - §A.5 / §A.6
U+05F3/U+05F4HEBREW GERESH / GERSHAYIM only when preceded by Hebrew script. - §A.7
U+30FBKATAKANA MIDDLE DOT only in labels also containing Hiragana, Katakana, or Han script. - §A.8 / §A.9 Arabic-Indic Digits (
U+0660..U+0669) and Extended Arabic-Indic Digits (U+06F0..U+06F9) must not mix in the same label.
Not required by UTS #46 §4 itself — added as a homograph-defense layer. Disable for strict UTS #46-only conformance.
- §A.3
IdnaTestV2DriverTestsconformance driver runs the official UnicodeIdnaTestV2.txt(v17.0.0) end-to-end across toUnicode, toAsciiN (Nontransitional), and toAsciiT (Transitional). All status code families (Pn,Vn,An,Bn,Cn,Xn,U1) are in scope: any row carrying any of them must be rejected by the implementation. CONTEXTO is disabled in the driver because UTS #46 vectors are agnostic to it; CONTEXTO is exercised independently byContextOTests. Driver covers >1000 vectors with 0 failures.- New per-rule test files:
IDNAProcessingTests,BidiRuleTests,ContextJTests,ContextOTests,IDNAIntegrationTests,PunycodeTests. Total IDNA test count: 102.
- Mapping table stored as parallel primitive arrays rather than
[(UInt32, UInt32, UInt8, UInt32, UInt8)]tuples. The tuple-array form OOMed GitHub-hosted macOS CI runners during parallel module compilation alongside the UTS #39 data tables; flat arrays let the emit-module phase type-check each homogeneous primitive cheaply. Tools/generate.pyemits five generated data files (IdnaMapping.swift,BidiClass.swift,JoiningType.swift,Virama.swift,Script.swift). Run only on UCD version upgrades; the generated Swift files are checked into source control.
- Comparison table now grades each adapter within its declared RFC scope. Previously every library was scored against the full corpus including UTS #39 / IDNA / RFC 6531 cases, penalizing libraries that never claimed to handle them. Adapters now declare their scope; the in-scope accuracy column is the headline metric, with a modern-superset column kept alongside for callers who care about beyond-scope coverage.
- Benchmark output refreshed for the IDNA companion target.
- README links the IETF Special-Use Domain RFCs (6761 / 6762 / 7686 / 8375 / 9476) inline and surfaces the IDNA companion target in the intro section.
CLAUDE.mdandSECURITY.mdrefreshed for the 1.7.x architecture (three library targets: core / UTS #39 / IDNA).- DemoApp corpus size reference corrected (150/195 → ~240).
This release closes a series of Unicode and RFC compliance gaps surfaced by adversarial review of 1.6.0. All findings reduce permissiveness; no API surface changed. Users on 1.6.0 should upgrade.
CharacterSet.letters on Darwin admits a number of Default_Ignorable
scalars that produce no glyph and are DISALLOWED in IDNA2008. Several
slipped through both the local-part and domain-label gates in 1.6.0:
- Domain label path now rejects
U+3164HANGUL FILLER,U+FE0FVS-16,U+E0100VS-17 (SSP),U+180B–U+180FMONGOLIAN FREE VARIATION SELECTORS,U+115F/U+1160HANGUL CHOSEONG/JUNGSEONG FILLERS,U+17B4–U+17B5KHMER VOWEL INHERENT, andU+FFA0HALFWIDTH HANGUL FILLER. PVALID combining marks (e.g.U+05B0HEBREW POINT SHEVA) remain accepted; a canary test pins this. - Local-part path now rejects
U+034FCOMBINING GRAPHEME JOINER, the SMPU+1BCA0–U+1BCA3SHORTHAND FORMAT controls, theU+1D173–U+1D17AMUSICAL SYMBOL formatting controls, and the reservedU+FFF0–U+FFF8block. - Leading combining marks rejected. A label starting with an
Mn/Mc/Me scalar (e.g. lone
U+0301) is now rejected per-label inextractDotAtom. Mid-label combining marks remain accepted so legitimate diacritics (a+U+0301=á) still validate.
- Empty quoted local part rejected.
""@x.comis now rejected for parity with the dot-atom path. RFC 5321 §3.3 notes the empty local-part is "not generally treated as a deliverable address". .arparejected byTLDDomainValidator. RFC 3172 reserves the.arpazone for DNS infrastructure (in-addr.arpa,ip6.arpa,iris.arpa); no MTA accepts mail there.home.arpa(RFC 8375) remains inspecialUseDomainsas defensive redundancy.TLDDomainValidator.isPubliclyDeliverable(_:)two-layer split. Direct callers passing hostnames with surrounding whitespace or IDNA-equivalent dot variants (U+3002,U+FF0E,U+FF61) used to receive an accept response because the function only inspected the rightmost ASCII-.-split label. The public form now trims and folds before dispatching to a raw_isPubliclyDeliverable(_:)worker.EmailSyntaxValidator's LDH gate already protected the end-to-end pipeline; this hardens theTLDDomainValidatorAPI in isolation.- IPv6 regex case + leading-zero gaps.
_matchIPv6accepted embedded-IPv4 octets with leading zeros (e.g.192.168.001.001) even though_matchIPv4correctly rejects them — same input got opposite verdicts depending on host shape (RFC 3986 §3.2.2). The IPv4-mapped prefix was also hardcoded lowercaseffff, rejecting[IPv6:::FFFF:1.2.3.4](RFC 4291 §2.2 case-insensitivity). Both fixed; six octet patterns and the[fF]{4}prefix updated.
- 75-octet cap enforced on encoder output. The decoder rejected
over-length encoded-words but the encoder did not, so long Unicode
inputs auto-encoded then re-decoded silently failed to round-trip.
encode()now returnsnilif the assembled=?utf-8?b?<base64>?=exceeds 75 chars (≈47 UTF-8 bytes), restoring encode→decode symmetry. - Base64 residue==1 rejected explicitly. The padding table emitted
illegal
===for 1-byte residues; Foundation rejected the result by accident. Nowencode()self-checks before relying on Foundation. - Encoded-text grammar tightened to
1*<text>per RFC 2047 §2. The third regex group changed from(.*)to([^? ]+), rejecting empty payloads (=?utf-8?b??=) and literal?in encoded-text (=?iso-8859-1?q?ab?cd?=).
- §5.2 Moderately Restrictive: second-script pool restricted to UAX #31
Recommended scripts. 1.6.0 enumerated all 174 scripts as candidate
partners with Latin (excluding only Cyrl/Grek/Latn/Common/Inherited),
so with
rejectRestrictedIdentifiers: falseLatin + Phoenician / Limbu / etc. were accepted. Replaced with a precomputedmoderatelyRestrictiveCandidateIDsset built from the 26 Recommended scripts (minus Latn/Cyrl/Grek/Common/Inherited; Cherokee Limited_Use also excluded). - §5.1 Augmented_Script_Set applied in Single Script analysis.
ScriptAnalyzer.isSingleScriptandstringCompatiblepreviously used rawScript_Extensions(X). Without §5.1 augmentation, pure-Japanese (Han + Hira + Kana, no Latin), pure-Korean (Han + Hang), pure-Chinese-with-Bopomofo (Han + Bopo), and even Hira + Kana strings were misclassified as multi-script at.singleScript. Highly Restrictive whitelist papered over Latin-included combos but not the no-Latin variants. AddedaugmentedScriptSet(of:)synthesizing virtual Hanb/Jpan/Kore IDs at0x10000+(UCD doesn't ship them asScript_Extensionsvalues).
TLDDomainValidatortwo-layer API. PublicisPubliclyDeliverable(_:)trims surrounding whitespace and foldsU+3002/U+FF0E/U+FF61to ASCII.before dispatching to_isPubliclyDeliverable(_:). The worker ispublic(with underscore-prefix marking module-internal use) only because Swift requires default-arg symbols of public functions to be public.
.asciiWithUnicodeExtensionmode documented as a project convention (whole-address RFC 2047 wrap), not standards-conformant SMTPUTF8 — RFC 2047 §5 forbids encoded-words insideaddr-spec. Steers callers needing arbitrary-MTA interop to.unicodemode (RFC 6531).domainLabelCharacterSetdocumented as a coarse Letter+digit gate, not RFC 5891 §4.2.3.2 PVALID enforcement; strict IDNA2008 is delegated to thedomainValidatorclosure.- UTS #39 docstrings corrected: the
RestrictionLevel.highlyRestrictivecombos now match §5.2.2 Table 1 (Japanese, Korean, Chinese — not the prior incorrect "Latin + Han, Latin + Han + Hiragana + Katakana, Latin + Han + Hangul + Bopomofo"). Eight other references to non-existent §5.2.1/§5.2.2/§5.2.3 anchors switched to §5.2's named-bullet form. - UTS #39 out-of-scope sections documented in the
UTS39namespace docstring: §5.6.1 Whole-Script Confusables, §5.7.1 Mixed-Numbers, andIdentifier_Type=Not_NFKCpre-normalization (with caller workaround for the NFKC case). - EmailSyntaxValidator domain-length comment rewritten to cite RFC 5321 §4.5.3.1.2 as the headline (255-octet wire cap) and explain how 253 is the derived presentation-form ceiling. Cap value (≤253) unchanged.
Tools/generate_tlds.pyswitched to PyPIidna(IDNA2008 + RFC 3492 Punycode) from stdlibencodings.idna(IDNA2003, deprecation-flagged). Generator runs in maintainer/CI hands so the new pip dependency has zero consumer impact. The bundledGenerated/IANATLDs.swiftis byte-identical because all 151 IDN TLDs in the IANA root zone round-trip cleanly under both modes today, and the SHA-256 short-circuit detects no-op runs. The nightly workflow installsidnabefore invoking the script.
- Test count grew from 272 to 299 (all passing).
- New coverage includes: Default_Ignorable rejection across BMP +
SMP + combining-mark-position vectors, leading combining marks,
IPv6 case-insensitivity and embedded-IPv4 leading zeros, RFC 6874
percent-encoded zone IDs, IDN case-fold (
example.БЕЛ), RFC 2047 encoder structural shape and the 47/48-byte boundary, General-address-literal rejection (RFC 5321 §4.1.3), single-label host rejection through default validator wiring, and IPv4-mapped IPv6 literals end-to-end. - Multiple weak assertions tightened across the suite (positive
equality replacing
XCTAssertNotEqual, wire-form pins for RFC 2047 round-trips,XCTAssertEqualreplacing one-way ASCII⇒Unicode invariant). - DemoApp test corpus extended with 34 Default_Ignorable spoofing
cases (33 negative + 1 PVALID canary).
swift run EmailBenchpicks them up via the existing symlink.
SwiftPublicSuffixListdependency. The package no longer pulls any third-party Swift dependency. The Public Suffix List was the wrong primitive for email validation: it was designed for cookie scoping and its multi-level / PRIVATE-section entries are policy artifacts of specific registries, with weekly churn driven by non-email concerns.
-
TLDDomainValidator(new public type). Default domain validator used byEmailSyntaxValidator. Confirms the rightmost DNS label is a currently-delegated IANA TLD (ACExn--…and Unicode U-label forms both accepted) and rejects names reserved by the IETF Special-Use Domain Names registry:.test(RFC 6761 §6.2).example,example.com,example.net,example.org(RFC 6761 §6.5).invalid(RFC 6761 §6.4).localhost(RFC 6761 §6.3).local(RFC 6762 — mDNS).onion(RFC 7686 — Tor).alt(RFC 9476)home.arpa(RFC 8375)
Subdomains under any of these are also rejected.
-
Sources/SwiftEmailValidator/Generated/IANATLDs.swift— bundled IANA TLD set (~1,400 ACE + ~150 U-label entries). Auto-generated; do not edit by hand. -
Tools/generate_tlds.py— Python 3 stdlib-only generator that fetcheshttps://data.iana.org/TLD/tlds-alpha-by-domain.txt, expands ACE TLDs to U-labels viaencodings.idna.ToUnicode, and writes the Swift source. Records source URL, fetch timestamp, and SHA-256. -
.github/workflows/update-tlds.yml— nightly workflow that refreshes the bundled TLD list and opens a PR if it changed. -
TLDDomainValidatorTests— new test class covering real TLDs, fake TLDs, special-use rejection, IDN handling, case insensitivity, trailing root dot, and wiring as the validator default.
- Default
domainValidatorclosure onEmailSyntaxValidator.correctlyFormattedandmailbox(from:)switched from{ PublicSuffixList.isUnrestricted(PublicSuffixList.ace($0)) }to{ TLDDomainValidator.isPubliclyDeliverable($0) }. UTS39.domainValidator(_:base:)default base closure likewise switched from PSL toTLDDomainValidator.EmailSyntaxValidator.correctlyFormatted(_:uts39:)andmailbox(from:uts39:)convenience overloads likewise switched.- README & benchmark output rewritten to describe the new default and the rationale for moving off the PSL.
- Drop the dependency: remove
SwiftPublicSuffixListfrom yourPackage.swift. SwiftEmailValidator no longer requires it. @example.com/@example.net/@example.orgnow fail the default validator (RFC 6761 §6.5). If your tests or sample addresses used these, switch to a real public domain (@iana.orgis stable) or pass a permissivedomainValidator: { _ in true }.@localhost,@host.local, intranet domains also fail the default. Pass a customdomainValidatorclosure if your application accepts these — see "Domain validation" in the README.- PSL-based custom rules: if you were calling
PublicSuffixList.isUnrestricted($0, rules: customRules), replace with your own closure (the test suite has examples of a simple TLD-allowlist closure inLocalPartValidatorHookTests). - Newly-delegated TLDs: the bundled list ships frozen at the
release SHA. The nightly GitHub workflow keeps the canonical copy
current; downstream consumers waiting for a tagged release can
override
domainValidatorwith their own check or runpython3 Tools/generate_tlds.pyand ship the regenerated file.
SwiftEmailValidatorUTS39companion library target. New second.libraryproduct inPackage.swiftthat layers UTS #39 Unicode Security Mechanisms on top of the core validator. Callers who don't need it continue toimport SwiftEmailValidatorand pay no size cost; callers who want anti-spoofing addimport SwiftEmailValidatorUTS39and opt in per call. All ~280 KB of UCD-derived data lives in the addon target. Covers:- Identifier_Status filter — rejects Restricted scripts (Linear B, Runic, Deseret, etc.).
- Mixed-script detection — Single Script / Highly Restrictive /
Moderately Restrictive per UTS #39 §5.2, using per-scalar
Script_Extensions ∩ targetintersection semantics. - §4 confusable skeletons — skeleton-equality against caller-supplied protected forms. Iterates map + NFD to a fixed point (confusables.txt has 13 non-idempotent entries requiring up to 3 iterations) and handles 48 multi-scalar NFD sources via a longest-match prefix table.
localPartValidatorparameter onEmailSyntaxValidator.correctlyFormattedandmailbox(from:). Non-escaping closure applied to the semantic local-part string (dot-atom as-is, quoted-string cleaned/unescaped) after RFC parsing succeeds. Default{ _ in true }preserves existing behavior; this is the extension point the UTS #39 addon uses. Symmetric with the existingdomainValidatorclosure.EmailSyntaxValidator.correctlyFormatted(_:uts39:)andmailbox(from:uts39:)convenience overloads (via extension in the addon target). Wire aUTS39.Policyinto both the local-part and domain-label validators in one call.UTS39.Policystruct with four knobs:level: RestrictionLevel,rejectRestrictedIdentifiers,rejectConfusables, and caller-suppliedconfusableSkeletons/confusableAllowlistsets.
Sources/SwiftEmailValidatorUTS39/Tools/generate.py+fetch-ucd.sh. Manual regeneration pipeline (not build-time) for producingData/{IdentifierStatus,Scripts,Confusables}.swiftfrom UCD 17.0.0. Re-run only on Unicode version upgrades; checked-in Swift files are the source of truth for downstream consumers.
- Test count grew from 164 to 242 (all passing).
LocalPartValidatorHookTests(7 cases) covering the new hook on the main library: default pass-through, rejection surfacing asnil, cleaned quoted-string semantic form, and interaction with auto-RFC2047 retry.SwiftEmailValidatorUTS39Tests(71 cases across 6 files):IdentifierStatusTests,MixedScriptTests,RestrictionLevelEdgeCaseTests(ICU-inspired boundary cases),ConfusablesTests,ConfusablesSkeletonRegressionTests(walks every entry in the generated confusables table, assertingskeleton(source) == skeleton(target)— this is the test that surfaced the non-transitive-closure and multi-scalar NFD bugs during implementation),DomainLabelTests,ConvenienceAPITests.
- IPv6 literal regex now accepts RFC 4291 §2.2 format 2 (six uncompressed
hex groups followed by a trailing IPv4-in-dotted-decimal, e.g.
aaaa:aaaa:aaaa:aaaa:aaaa:aaaa:127.0.0.1). The upstream regex this validator was derived from only recognised the compressed / IPv4-mapped forms (::ffff:x.x.x.x,1::5:x.x.x.x). Found by running each competitor library's own test corpus through SwiftEmailValidator; this was the single genuine gap surfaced by that reverse check (the other four disagreements were syntax-vs-policy differences caught by our defaultdomainValidator). - Address
valid.ipv6v4.addr@[IPv6:aaaa:aaaa:aaaa:aaaa:aaaa:aaaa:127.0.0.1]now validates as expected. Max IPv6 literal length remains 45 octets — already within theIPAddressSyntaxValidatorpublic-API length cap, no guard changes needed. - Added
testIPv6Format2UncompressedWithEmbeddedIPv4andtestIPv6Format2RejectsWrongGroupCountregression tests; updated the boundary-form test. Test count is now 164.
IPAddressSyntaxValidatorpublic length-capped wrappers. The publicmatch(_:),matchIPv4(_:), andmatchIPv6(_:)methods now apply autf8.countguard before dispatching to the regex engine: 15 octets for IPv4 (max255.255.255.255), 45 octets for IPv6 (maxffff:ffff:ffff:ffff:ffff:ffff:255.255.255.255). Prior to this release these methods had no input-length bound, so a caller passing a multi-megabyte string would spend O(n) insideNSRegularExpressionbefore the trailing$anchor failed — a potential denial-of-service vector for code paths that expose the validator to untrusted input directly (bypassingEmailSyntaxValidator, which already caps the whole address at 254 UTF-8 octets).- Internal raw matchers
_match(_:),_matchIPv4(_:), and_matchIPv6(_:)retain the pre-1.4.0 behaviour (no length guard) and are used byEmailSyntaxValidator.extractHostLiteraldirectly — the upstream address cap already bounds the input, so the hot path avoids a redundant secondutf8.countcheck. Benchmarks/SPM package. A new standalone harness runs the 195-case DemoApp corpus through every SPM-consumable Swift email validator we could locate (evanrobertson, MimeEmailParser, bdolewski's regex, jwelton-equivalent viaNSDataDetector) and emits a Markdown accuracy table. Kept out of the mainPackage.swiftso library consumers don't transitively pull the competitors. See the "Comparison with other Swift email validators" section in the README for the published results and the methodology.
- The length-capped public wrappers close the only input-length DoS
vector found in a manual audit of the library's public API surface.
EmailSyntaxValidatorusers were never exposed (it already caps the input upstream); the vector applied only to callers invokingIPAddressSyntaxValidatordirectly. No crashes were introduced;EmailSyntaxValidator.correctlyFormatted(_:)behaviour is unchanged.
- SwiftPublicSuffixList dependency bumped to 3.1.0. v3.0 tightened
PublicSuffixList.isUnrestricted(_:)/match(_:)to reject non-ASCII hostnames — IDN labels must be in ACE (Punycode) form. The defaultdomainValidatorclosure now callsPublicSuffixList.ace(_:)on the domain before dispatching toisUnrestricted(_:), so Unicode IDN domains continue to validate exactly as they did on 1.3.0 with PSL 2.x. Mailbox.Host.domain(...)still carries the original user-facing string; only the validator dispatch uses the ACE form.
Callers who pass a custom domainValidator closure to
correctlyFormatted(_:) / mailbox(from:) and rely on the PSL default
behavior via PublicSuffixList.isUnrestricted(_:) should wrap their call
site with PublicSuffixList.ace(_:) if the closure receives Unicode IDN
domains — e.g. { PublicSuffixList.isUnrestricted(PublicSuffixList.ace($0), rules: myRules) }.
This release consolidates a series of Unicode and RFC compliance fixes that hardened the validator against spoofing, parser-confusion, and out-of-spec inputs. Earlier 1.1.x versions accept characters and encoded sequences that should be rejected; users should upgrade.
- Zs-category space characters rejected. U+00A0 (NO-BREAK SPACE), U+1680 (OGHAM SPACE MARK), U+2000–U+200A (EN QUAD … HAIR SPACE), U+202F (NARROW NO-BREAK SPACE), U+205F (MEDIUM MATHEMATICAL SPACE), U+3000 (IDEOGRAPHIC SPACE) are visually indistinguishable from U+0020 in most fonts and could be used to register lookalike accounts.
- Reserved format character U+2065 rejected. The previous block (U+2060–U+2064) left U+2065 reachable; the range is now U+2060–U+2065.
- Plane 1–3 supplementary noncharacters rejected. U+1FFFE/U+1FFFF, U+2FFFE/U+2FFFF, U+3FFFE/U+3FFFF (Unicode §23.7 permanently reserved noncharacters) are now blocked via explicit scalar guards.
- Planes 4–13 (U+40000–U+DFFFF) rejected. Entirely unassigned in Unicode; should never appear in interchange.
- Full Supplementary Special-purpose Plane and Supplementary PUA rejected. U+E0000–U+10FFFF blocked as a single guarded range, covering Tags, Variation Selectors Supplement, and Private Use Areas A/B.
- Variation Selectors rejected. U+FE00–U+FE0F are invisible combining characters that produce no glyph (same spoofing risk as ZWJ/ZWNJ).
- U+FDD0–U+FDEF and U+FFFE/U+FFFF rejected. BMP §23.7 permanently reserved noncharacters.
- DEL (0x7F) rejected from Q-encoded content. The prior
value >= 0x20guard admitted 0x7F. - C1 control bytes (0x80–0x9F) rejected from Q-encoded ISO-8859-1/2 content. Previously decoded to U+0080–U+009F C1 controls, which RFC 5198 §2 forbids in network interchange.
- 75-character encoded-word limit enforced per RFC 2047 §2 (the prior limit allowed 76).
- Underscore-as-space decoding in Q encoding per RFC 2047 §4.2.
- Per-scalar character-set check.
extractQuotedStringnow usesunicodeScalars.allSatisfyinstead ofrangeOfCharacter(which only inspected the first scalar of a grapheme cluster), preventing security-excluded scalars from slipping through as combining elements. - Escaped quoted-pair restricted to a single ASCII scalar per RFC 5321
(
quoted-pair = "\" (VCHAR / WSP)). - Inline scalar guard in
extractQuotedStringmatches the dot-atom guard, ensuring identical security posture across both local-part forms.
- Zone identifiers rejected (e.g.
fe80::1%eth0) per RFC 5321 §4.1.3. - Empty
IPv6:literal, non-IPv6 strings after theIPv6:tag, and non-standard literal types (e.g.[SMTP:…]) are rejected, with regression tests added.
- CharacterSet construction order. Foundation's
CharacterSethas a bug where calling.subtracting()on a set containing supplementary Unicode planes corrupts the supplementary-plane bitmap. All exclusion sets are now subtracted beforesupplementaryPlanesis added via.union().unicodeNonCharacters,unicodeSpaceChars, andzeroWidthAndInvisibleCharswere updated to follow this rule. - Domain octet limits enforced (RFC 1035 §2.3.4): per-label ≤63 octets, total domain ≤253 octets — measured in UTF-8 bytes, not character count.
- Total email length enforced at 254 UTF-8 bytes (RFC 5321 §4.5.3.1.3).
- Quoted-string local part length enforced at 64 UTF-8 bytes.
- ASCII-only domain labels in
.asciimode. Unicode U-labels are now rejected whencompatibility == .ascii; Punycode (xn--…) labels remain accepted. - Source-route addresses rejected (
@relay.host:user@domain). - Empty host, double-
@, empty domain labels, leading/trailing dots and hyphens in domain labels are rejected regardless of thedomainValidatorclosure.
mailbox(from:)andcorrectlyFormatted(_:)doc comments corrected. Prior comments referenced astrategy:parameter andValidationStrategytype that do not exist. Documentation now matches the actualoptions:parameter andOptionsenum.
- Test count grew from 77 to 125 (all passing).
- New thematic coverage for: Zs space spoofing, supplementary noncharacters, Variation Selectors, the Unicode Tags block, source routes, address-literal edge cases, byte-vs-character length boundaries, RFC 2047 75-char limit and C1 rejection.
- EmailSyntaxValidator: Class-level documentation with usage examples
- Mailbox: Struct and property documentation (
email,localPart,host) - LocalPart/Host enums: Case documentation for
dotAtom,quotedString,domain,addressLiteral - Options enum: Documentation for
autoEncodeToRfc2047option - Compatibility enum: Detailed documentation for
ascii,asciiWithUnicodeExtension,unicodemodes - RFC2047Coder: Class documentation with encoding examples,
encode()anddecode()method documentation - IPAddressSyntaxValidator: Class documentation explaining RFC 5321 context
EmailSyntaxValidatorTests.swift
testLocalPartExactly63Characters- Boundary test for 63-character local parttestLocalPartExactlyOneCharacter- Minimum valid local parttestLocalPartEmptyString- Empty local part rejectiontestUnicodeLocalPartCharacterVsByteCount- 30 four-byte Unicode chars (120 bytes, 30 chars)testUnicodeLocalPartExceeds64Characters- 65+ Unicode character rejectiontestEmojiInLocalPart- Emoji validation in Unicode modetestCombiningMarksInLocalPart- Diacritics and combining characterstestHighUnicodeRanges- Characters beyond BMP (U+1D400+)testZeroWidthCharacters- ZWSP, ZWJ, ZWNJ handlingtestBidirectionalOverrideCharacters- RTL/LTR control character rejectiontestC1ControlCharactersRejected- C1 control character rejection (U+0080-U+009F)testRFC2047EncodedWithIPv4AddressLiteral- RFC2047 with IPv4 literaltestRFC2047EncodedWithIPv6AddressLiteral- RFC2047 with IPv6 literaltestQuotedStringWithMultipleAtSymbols- Multiple @ in quoted stringstestQuotedStringWithRFC2047Decoding- RFC2047 decoded quoted stringstestAutoEncodeToRfc2047WithAddressLiteral- Combined options testingtestCustomDomainValidatorAcceptsAnyDomain- Permissive validatortestCustomDomainValidatorRejectsAllDomains- Restrictive validatortestCustomDomainValidatorWithSpecificTLDs- TLD-specific validationtestCustomDomainValidatorReceivesCorrectDomain- Domain parameter verificationtestCustomDomainValidatorWithUnicodeDomain- IDN domain handlingtestMultipleDotsInVariousPositions- Valid multi-dot local partstestSingleCharactersBetweenDots- Minimal segments between dotstestMaxConsecutiveSpecialCharacters- Consecutive special characterstestSpecialCharactersAtBoundaries- Special chars at start/end of segmentstestExtremelyLongLocalPart- 1000 character local part rejectiontestExtremelyLongDomain- 500+ character domain handlingtestVeryLongRFC2047EncodedString- Near 76-char limit RFC2047testManyUnicodeCharactersInLocalPart- 64 diverse Unicode characters
RFC2047CoderTests.swift
testDecodingUTF16B- Base64 with UTF-16 charsettestDecodingUTF32B- Base64 with UTF-32 charsettestDecodingUTF16InvalidData- Malformed UTF-16 rejectiontestDecodingUTF32InvalidData- Malformed UTF-32 rejectiontestEncodeDecodeRoundTripSimpleASCII- ASCII round-triptestEncodeDecodeRoundTripUnicode- Unicode round-triptestEncodeDecodeRoundTripSpecialCharacters- Special character round-triptestDecodingLatin2QPolishCharacters- Polish special characterstestDecodingLatin2QCzechCharacters- Czech special characterstestDecodingLatin2InvalidControlCharacter- Invalid byte handlingtestEncodeEmptyString- Empty string encodingtestDecodeWithMixedCaseCharset- Case-insensitive charsettestDecodeWithMixedCaseEncoding- Case-insensitive encoding typetestDecodeWithWhitespaceInEncodedWord- Whitespace handling
IPAddressValidatorTests.swift
testIPv6ZoneIdentifiers- Zone identifier rejection per RFC 5321testIPv6LoopbackVariants-::1variationstestIPv4MappedIPv6Extended-::ffff:mapped addressestestIPv4LeadingZeros- Leading zeros handlingtestEmptyIPAddressStrings- Empty/whitespace rejection
- EmailSyntaxValidator.swift: Reordered CharacterSet construction to work around Foundation bug where
.subtracting()corrupts supplementary Unicode plane data. Supplementary planes (U+10000-U+10FFFF) are now added last, after all subtractions.
- IPAddressSyntaxValidator.swift: IPv6 zone identifiers (e.g.,
fe80::1%eth0) are now correctly rejected. Per RFC 5321 Section 4.1.3, zone identifiers are not valid in email address literals.
- EmailSyntaxValidator.swift: C1 control characters (U+0080-U+009F) are now rejected in Unicode mode. Per RFC 5198 Section 2, these control characters should be avoided in network interchange.
- EmailSyntaxValidator.swift: Fixed supplementary Unicode plane support (U+10000-U+10FFFF). Emoji, mathematical symbols, and other characters beyond the Basic Multilingual Plane now correctly validate in Unicode mode.
-
EmailSyntaxValidator.swift: Bidirectional formatting characters are now rejected:
- Left-to-Right Mark / Right-to-Left Mark (U+200E-U+200F)
- Directional embeddings and overrides (U+202A-U+202E)
- Directional isolates (U+2066-U+2069)
- Deprecated format characters (U+206A-U+206F)
These characters can be exploited for homograph attacks and email spoofing.
Foundation's CharacterSet has a bug where calling .subtracting() on a set that includes supplementary Unicode planes (U+10000+) corrupts the supplementary plane data, even when the subtracted characters don't overlap. The workaround is to add supplementary planes as the final .union() call, after all .subtracting() operations are complete.
// WRONG - supplementary planes get corrupted by subsequent subtractions
let charset = baseSet
.union(supplementaryPlanes) // Added here...
.subtracting(c1Controls) // ...corrupted here
// CORRECT - add supplementary planes last
let charset = baseSet
.subtracting(c1Controls) // All subtractions first
.union(supplementaryPlanes) // Add supplementary planes last