Merging PDFs that mix A4, Letter, Legal, and custom page sizes without distortion is a nuanced challenge. This deep-dive guide covers the PDF MediaBox specification, scaling transformation matrices, layout normalisation rules, and how MojoDocs' WebAssembly engine handles every size variant locally—without uploading a single byte to the cloud.
The Indian administrative and professional document ecosystem is a patchwork of incompatible page sizes. A home loan dossier for a nationalised bank might combine A4-sized UIDAI Aadhaar scans, US Letter-format transcripts from an overseas university, Legal-size property deed printouts, and the compact booklet-page dimensions of an MEA Passport photocopy. When you attempt to merge these into a single coherent PDF using a generic tool, the result is often a chaotic mix of clipped content, blank white margins, and rescaled text that looks nothing like the originals. Fixing these issues is not trivial. It requires a deep understanding of the PDF page geometry model, affine transformation matrices, and the layout-normalisation rules that govern how a merge engine should behave when it encounters mixed dimensions. This guide explains every aspect of that process and shows how you can perform it securely, locally, and for free using the MojoDocs PDF Merger.
Before diving into the technical architecture, it is worth understanding why the problem exists at all. Most everyday users assume that a PDF is just a digital sheet of paper and that merging two PDFs is as simple as stapling two stacks together. In reality, every PDF page is a self-contained coordinate system with its own declared canvas size, and when two pages with different declared sizes collide inside a single document container, the reader application must decide how to render them. Without an explicit normalisation rule encoded in the merged file, different PDF viewers will make different rendering decisions—some will scale to fit, others will clip the larger page, and others will leave large white borders around the smaller one. None of these outcomes is predictable or professional.
1. The PDF Page Geometry Model: MediaBox, CropBox, BleedBox, TrimBox, and ArtBox
To understand the page-size problem, you must first understand how the PDF specification (ISO 32000-2:2020) defines the geometry of a page. Every page object in a PDF can carry up to five nested bounding-box records, each serving a distinct purpose in the document lifecycle. These are defined as arrays of four numbers: [x_min, y_min, x_max, y_max], measured in PDF user units (1 unit = 1/72 inch by default).
1.1 MediaBox — The Physical Canvas
The /MediaBox is the master bounding box. It defines the full physical size of the medium on which the page will be rendered. This is the property that determines what most people call the "page size." Common standard values include:
- A4 Portrait:
[0 0 595.276 841.890]— the ISO 216 international standard, mandatory for most Indian government portals including Income Tax e-Filing, MCA21, and UIDAI forms. - US Letter Portrait:
[0 0 612 792]— the North American standard, common in PDFs generated by US universities, US law firms, or Microsoft Word installations configured in English (US) locale. - Legal Portrait:
[0 0 612 1008]— used extensively for property deeds, affidavits, and court documents in states that follow older printing conventions. - A3 Landscape:
[0 0 1190.551 841.890]— used for architectural drawings, blueprint sheets, and large-format engineering diagrams. - MEA Passport Booklet Page: Typically approximately
[0 0 340 240]or similar compact custom sizes depending on the scanning configuration. - Parivahan RC Booklet Page: Scanned pages from the physical Registration Certificate booklet are often non-standard, ranging around
[0 0 510 360]depending on the scanner DPI and physical book dimensions.
When a PDF viewer loads a page, it allocates a rendering canvas of exactly the MediaBox size. If content stream drawing commands place ink outside this canvas, the viewer simply does not render those strokes—the content is clipped invisibly. This is the primary cause of the "cut-off content" problem users encounter when merging mixed-size PDFs without a proper scaling step.
1.2 CropBox — The Visible Window
The /CropBox defines the region of the page that is visible in a normal PDF viewer. If a CropBox is defined and is smaller than the MediaBox, the viewer will display only the cropped area. Scanners that produce oversized scans and then apply a software crop often set a CropBox that is smaller than the MediaBox. When merging, a naive engine may read only the MediaBox and ignore the CropBox, accidentally restoring the invisible border content and making the document look uncropped.
1.3 TrimBox, BleedBox, ArtBox
These three boxes are used primarily in commercial print workflows. The /TrimBox defines where the page should be physically cut after printing. The /BleedBox extends slightly beyond the TrimBox to accommodate printer registration variance. The /ArtBox marks the area of meaningful content. For most end-user document merging tasks—like combining bank statements or government forms—these boxes are rarely present. However, PDFs generated from professional design tools like Adobe InDesign or Affinity Publisher will include them, and a merge engine that blindly reads the largest declared box without checking CropBox priority will produce unexpected layout results.
Pro Tip: When merging PDF scans from Parivahan DL/RC booklets or MEA Passport copies, always verify the page dimensions of your source files using a local tool before merging. If your scanned PDFs have oversized MediaBox values with a CropBox clipping them down, a merge engine must be CropBox-aware—otherwise the combined document will display large white borders around your content. MojoDocs reads and respects the CropBox hierarchy automatically.
2. Why Simple Concatenation Fails with Mixed Page Sizes
A PDF file is not a sequential stream of pages. It is a structured object database with a cross-reference table (xref), a document catalog (/Root), and a page tree (/Pages). When you merge two PDFs at the byte level by simple concatenation, you create two overlapping object ID namespaces, two conflicting xref tables, and two separate root catalogs. No standard PDF viewer can parse this; the result is a corrupted file. Even tools that correctly re-index object IDs and rebuild the page tree will fail silently on mixed-size documents if they do not also resolve the MediaBox conflicts.
Here is a concrete example of the problem. Suppose you are merging a two-page A4 bank statement from HDFC Bank with a three-page Legal-size property deed. The A4 pages have a MediaBox of [0 0 595 842] and the Legal pages have a MediaBox of [0 0 612 1008]. A naive merge tool will create a combined page tree where the first two pages have a declared canvas of 595 × 842 units and the last three pages have a canvas of 612 × 1008 units. When the merged document is opened in a PDF viewer, the viewer will render each page at its own declared size. The result is a document where consecutive pages jump in visual size—the user has to scroll differently and resize the view for every page change. In a printed version, the printer driver will try to normalise these to its default paper size, potentially cropping or stretching content.
For professional submissions—say, a home loan application to a nationalised bank where the compliance officer is reviewing a printed copy—a document with inconsistent page sizes looks unprofessional and may even be flagged as improperly prepared, causing delays or rejection of your application.
3. The Three Layout Normalisation Strategies
A professional PDF merge engine must implement explicit layout rules for handling mixed-size pages. There are three main strategies, each suited to different use cases:
Strategy A: Preserve Original Sizes (No Normalisation)
In this mode, every page retains its own MediaBox exactly as declared in the source document. No scaling, translation, or transformation is applied. The merged document will have pages of mixed sizes. This is the correct choice when:
- You need to preserve exact page dimensions for print-accurate documents (e.g., architectural drawings).
- The recipient's PDF viewer or printing pipeline is already configured to handle mixed-size documents.
- You are archiving source documents and need to preserve the exact byte-level geometry of each original.
The preserve strategy is the simplest to implement algorithmically but produces the most inconsistent visual experience in standard readers. Most Indian government portals that require a single combined PDF—such as the MEA Passport portal or the UIDAI Aadhaar update interface—expect a consistently sized document (A4), making this strategy inappropriate for official submissions.
Strategy B: Normalise All Pages to a Target Size (e.g., A4)
In this mode, the merge engine selects a target page size (typically A4 for Indian contexts or US Letter for US-context documents) and applies a scaling transformation to every page whose MediaBox differs from the target. This is the most useful strategy for government portal submissions, professional reports, and email attachments, as it produces a visually consistent document where every page occupies the same canvas size.
The normalisation process involves computing a scale factor and centering offset for each non-conforming page. Specifically, for a source page of width W and height H being normalised to a target of width T_w and height T_h, the scaling factor is min(T_w / W, T_h / H). This "fit-to-canvas" scaling preserves the aspect ratio, preventing distortion, and centres the scaled content within the target canvas by computing translation offsets of (T_w - scale × W) / 2 horizontally and (T_h - scale × H) / 2 vertically.
Strategy C: Normalise All Pages to the Largest Page in the Set
In this mode, the engine scans all input pages, identifies the largest MediaBox (by area), and scales all smaller pages up to match that canvas. This prevents any content from being made smaller than its original, which is valuable when you cannot afford to reduce the visual size of fine-print legal text or detailed engineering diagrams. The trade-off is that documents originally sized for A5 or compact booklet pages may appear with significant empty white space when scaled up to Legal or A3 dimensions.
4. The Mathematics of Affine Transformation: Scaling PDF Content Streams
The most technically complex aspect of size-aware PDF merging is the actual scaling of the content stream. A PDF page's visual content is described by a sequence of PostScript-like drawing operators embedded in a compressed binary stream (the /Contents object). These operators include commands like Tm (set text matrix), cm (concatenate matrix), re (draw rectangle), and Do (paint XObject image). Every coordinate in these commands is expressed in the page's own user-space coordinate system, where the origin is the bottom-left corner of the MediaBox.
To scale a page's content to fit a new canvas size, the merge engine cannot simply resize the MediaBox rectangle—that would only change the declared canvas size without moving the ink. The engine must prepend a current transformation matrix (CTM) modification to the content stream. In PDF syntax, this is done by inserting a cm operator at the very beginning of the content stream:
q
scale_x 0 0 scale_y translate_x translate_y cm
[... original content stream operators ...]
Q
The q and Q operators save and restore the graphics state, ensuring the transformation is isolated to this page's content. The six numbers in the cm operator represent an affine transformation matrix in the form [a b c d e f], where for a pure scale-and-translate operation: a = horizontal scale factor, d = vertical scale factor, e = horizontal translation in points, and f = vertical translation in points. b and c are zero for non-rotated transformations.
For example, scaling an A4 page (595.276 × 841.890) to fit inside a US Legal canvas (612 × 1008) while preserving aspect ratio and centering would produce the following matrix computation:
scale_factor = min(612 / 595.276, 1008 / 841.890) = min(1.0281, 1.0548) = 1.0281
translate_x = (612 - 1.0281 * 595.276) / 2 = (612 - 612) / 2 = 0
translate_y = (1008 - 1.0281 * 841.890) / 2 = (1008 - 865.5) / 2 = 71.25
cm operator: 1.0281 0 0 1.0281 0 71.25 cm
This matrix causes the page content to be scaled up by 2.81% and shifted upward by 71.25 points (about 1 inch) to centre it vertically within the Legal canvas. Because this operation modifies the compressed content stream byte data, it requires a full decode-transform-encode cycle for every page, which is computationally expensive. This is precisely why traditional cloud PDF tools either skip this step (producing inconsistent layouts) or require you to pay for processing time on their servers. MojoDocs' WebAssembly engine performs this computation locally on your device's CPU at near-native speed.
5. The Data Sovereignty Imperative: Why Cloud Mergers Are Dangerous
The Indian digital identity infrastructure means that almost every important document merge involves files that contain the most sensitive personal data imaginable. Consider what a typical home loan document package contains:
- UIDAI Aadhaar Card: Your full legal name, date of birth, gender, residential address, and the unique 12-digit Aadhaar number linked to your biometric records. Exposure of this number can enable fraudulent SIM card applications, e-KYC scams, and unauthorised Aadhaar e-sign operations.
- NSDL PAN Card: Your Permanent Account Number, directly linked to your Income Tax profile, Form 26AS tax credit statement, and all your bank accounts. PAN exposure enables tax fraud, unauthorised loan applications, and shell company registrations in your name.
- MEA Passport: Your passport number, place of birth, issuing authority, visa history, and signature. Passport data is used by international fraudsters to forge travel documents and apply for foreign visas.
- Parivahan DL/RC: Your driving licence number, vehicle registration number, engine number, chassis number, and insurance policy number. RC data is actively misused in vehicle cloning and insurance fraud schemes.
- Bank Statements: Your salary credits, vendor payments, and investment transactions—a complete picture of your financial behaviour for identity-targeted social engineering attacks.
When you upload this package to a cloud PDF merger, you are transmitting all of this data to servers you do not control, governed by privacy policies you may not have read, hosted in data centres that may be located outside India. The Digital Personal Data Protection (DPDP) Act 2023 establishes that data fiduciaries must implement appropriate technical and organisational measures to protect personal data. Using a foreign cloud processor for documents containing Aadhaar, PAN, and bank data may not meet those standards, and the legal liability ultimately falls on the individual or organisation that chose that processor.
The alternative is categorically simple: process the files locally. MojoDocs' WebAssembly-powered merger never transmits your documents over the network. The WASM binary is downloaded once to your browser cache. After that, every merge operation—including the complex mixed-size scaling described in this article—executes entirely within your browser's memory sandbox on your own device. No server, no upload, no risk.
| Method | Cost | Privacy |
|---|---|---|
| Adobe Acrobat Pro (per user/month subscription) | ~₹1,593/month (~₹19,116/year per user) | Medium — local processing but cloud sync features are prompted aggressively |
| iLovePDF / Smallpdf Cloud Paid Plan | ~₹830/month (~₹9,960/year per user) | Low — every file is uploaded to EU/US servers for processing |
| Local Xerox / Cyber Cafe Operator | ₹10–₹20/page; ₹200–₹500 per complex submission | Critical Risk — documents saved on shared public machines, PAN/Aadhaar exposed |
| Blinkit Print / Zepto Print Service | ₹5–₹15/page for print delivery; no digital merge capability | Medium — print only, no cloud upload, but file sent to vendor print queue |
| MojoDocs PDF Merger (WebAssembly, Client-Side) | ₹0 — Free forever, unlimited files, no account required | Maximum — 100% local processing, zero server uploads, verifiable offline |
For a law firm with 20 professionals each processing five document packages per week, the annual saving by switching from Adobe Acrobat Pro to MojoDocs is approximately ₹3,82,320 per year (20 users × ₹19,116/year). Beyond cost, the complete elimination of cloud upload risk for client-privileged documents—which under the Bar Council of India's rules must be kept strictly confidential—makes the local-first approach a professional imperative.
6. Real-World Indian Document Size Conflicts and How to Resolve Them
Let us walk through six concrete scenarios that Indian users regularly encounter when merging PDFs with different page sizes, and explain the correct strategy for each.
Scenario 1: Home Loan KYC Package (UIDAI Aadhaar + NSDL PAN + Bank Statement)
Most private sector banks require a single consolidated KYC PDF for home loan applications. The typical components are:
- Aadhaar card PDF downloaded from the UIDAI SSUP portal — A4 size (595 × 842 pt), portrait orientation.
- PAN card PDF downloaded from the NSDL e-PAN portal — typically a compact custom size approximately 396 × 252 pt (equivalent to a credit-card-sized card on a smaller page).
- Last six months of bank statements downloaded from net banking — A4 size, portrait, often 20–40 pages.
Recommended Strategy: Normalise all pages to A4. The PAN card page will be scaled up and centred on an A4 canvas, maintaining its aspect ratio. Bank statement pages require no scaling. The result is a consistently A4 document ready for direct upload to the bank portal's KYC submission interface.
Scenario 2: Property Purchase Due Diligence (Sale Deed + Encumbrance Certificate + Property Tax Receipts)
Property transaction documents are notorious for size inconsistency. Sale deeds registered at the Sub-Registrar's office are printed on Legal-size paper (612 × 1008 pt) because state governments print in that format. Encumbrance certificates from the registration department may be A4 or Legal. Property tax receipts from municipal corporations are often A5 (420 × 595 pt) thermal-printed pages scanned on an A4 flatbed scanner, resulting in an A4 file with A5 content centred within it.
Recommended Strategy: Normalise to Legal, as that is the largest and most common format in this stack. A4 pages will be scaled up slightly (about 2.8%) to match Legal height. A5 content within A4 MediaBoxes will be untouched at the A4→Legal scaling step but will appear with visible white margins above and below the A5 content block—which is accurate and expected for this document type.
Scenario 3: MEA Passport Application — Nomination Form Compilation
Online passport applications via the Passport Seva portal (MEA) often require combining the signed application form (A4), birth certificate scan (variable size), address proof utility bill (A4, sometimes rotated landscape), and police verification report (A4). Photographs must be embedded as image objects within the PDF at specific dimensions.
Recommended Strategy: Normalise to A4 portrait. Rotate landscape-scanned pages to portrait orientation using the /Rotate 270 or /Rotate 90 entry in the page dictionary before applying the scaling transformation. This produces a consistent portrait-oriented A4 document that meets the Passport Seva upload requirements.
Scenario 4: UPSC/NEET/JEE Application Document Package
Competitive examination applications require combining admit cards, mark sheets, caste certificates, and identity proofs. Admit cards are often A4 landscape (scanned from the centre's board). Mark sheets from state boards are A4 portrait. Caste certificates vary by issuing authority—Maharashtra's caste certificates are A4, while some other state formats are closer to A5. Identity proofs (Aadhaar, PAN) are compact sizes as described above.
Recommended Strategy: Normalise to A4 portrait. Landscape pages should be rotated before scaling. Compact-size identity cards (PAN, Aadhaar card format) should be scaled up with centering. The output will be a consistent A4 portrait package ready for NTA or state PSC portal submission.
Scenario 5: Architect / Civil Engineer: Combining Blueprint Sheets with Specification Pages
Architectural drawings are typically A1 or A3 landscape. Specification pages, schedules of materials, and structural notes are A4 portrait. Combining these into a single tender submission document is common in government infrastructure projects.
Recommended Strategy: Preserve original sizes. The tender reviewer needs to see the full-scale drawings and the A4 specification pages at their correct dimensions. Modern PDF viewers and plotter printers can handle mixed-size documents in the architectural context. Forcing everything to A4 would render the blueprint details unreadably small.
Scenario 6: Freelancer Tax Filing — GST Returns + Income Invoices + TDS Certificates
Freelancers filing GST returns manually combine GSTR-1/3B acknowledgement PDFs (A4, portrait), client invoices in various formats (some A4, some custom letterhead sizes like 595 × 770 pt), and Form 26AS TDS statements from the Income Tax e-Filing portal (A4, portrait). Some overseas clients send invoices in US Letter format (612 × 792 pt).
Recommended Strategy: Normalise to A4. US Letter pages (612 × 792 pt) are close in size to A4 (595 × 842 pt); the scaling factor is very small (about 0.97× vertical), producing minimal visible change. Custom letterhead pages will be proportionally scaled to A4, preserving the relative layout. The result is a professionally consistent A4 document package for the CA or tax portal upload.
7. How MojoDocs' WebAssembly Engine Handles Mixed-Size Merges
MojoDocs compiles a fully featured PDF manipulation library to WebAssembly, enabling all the operations described above to run in your browser. Here is the complete processing pipeline the engine executes when you merge PDFs with different page sizes:
Phase 1: File Ingestion and Metadata Extraction
When you drag files into the PDF Merger, the browser reads them via the HTML5 FileReader API as ArrayBuffer objects. These buffers are transferred to the WASM module's shared memory heap using a zero-copy transfer operation (via SharedArrayBuffer where browser policies permit, or a structured clone where they do not). The WASM parser reads the PDF trailer, locates the xref table or cross-reference stream, and builds an in-memory object map. For each page object, it extracts the MediaBox, CropBox, Rotate entry, and resource dictionary.
Phase 2: Layout Analysis and Strategy Selection
The engine collects all page dimension records from all input files and presents them in the user interface as a summary (e.g., "3 files, 12 pages: 8 × A4, 3 × Legal, 1 × Custom"). The user selects the layout normalisation strategy: Preserve, Normalise to A4, Normalise to US Letter, or Normalise to Largest. This selection is passed as a configuration parameter to the merge routine.
Phase 3: Object Re-indexing and ID Collision Resolution
As with any multi-file PDF merge, the engine must resolve object ID namespace collisions. Every source PDF numbers its indirect objects starting from 1. When merging multiple files, the engine computes an offset for each subsequent file based on the maximum object ID used so far, then rewrites every object ID and every X Y R reference pointer in that file's object dictionary. This is a depth-first recursive traversal of the entire object graph for each source document.
Phase 4: Content Stream Transformation (The Scaling Step)
For pages that require scaling under the selected strategy, the engine decompresses the content stream (most PDF content streams use FlateDecode, i.e., zlib/deflate compression), prepends the q [a 0 0 d e f] cm matrix operator block, and then recompresses the stream. It also updates the page's /MediaBox to the target canvas dimensions and removes or adjusts the /CropBox to match. Importantly, any embedded XObject image resources referenced on the page do not need to be resampled—the affine transformation is applied at the rendering layer, not the pixel layer, so image resolution is preserved perfectly.
Phase 5: Font and Image Resource Deduplication
If multiple source documents use the same embedded font (a common occurrence when all documents were created by the same application, e.g., all bank statements from the same HDFC Bank portal generator), the engine identifies these duplicates by comparing the font program binary hashes and redirects all page resource dictionary references to a single shared font object. This deduplication can reduce the final merged file size by 30–60% in document packages where the same institutional font is embedded in every source file.
Phase 6: Page Tree Construction and Cross-Reference Table Generation
The engine builds a new root catalog, a new parent Pages object, and a Kids array containing references to all merged page objects in the user-specified order. The /Count value is set to the total number of output pages. A new xref table (or cross-reference stream for PDF 1.5+ compatibility) is generated, recording the precise byte offset of every indirect object in the final serialised byte stream. The trailer dictionary is written pointing to the root catalog and the xref, and the file is terminated with an %%EOF marker.
Phase 7: Blob Generation and Download
The final serialised PDF byte array is transferred from WASM heap memory back to the JavaScript main thread as a Uint8Array. The browser wraps this in a Blob with MIME type application/pdf, creates an object URL, and triggers a download. At no point does any network request carry your document data. The entire process, from file ingestion to download initiation, occurs within the browser's isolated memory sandbox.
The Flight Mode Verification
1. Open MojoDocs. 2. Turn off WiFi/Internet. 3. Process the file. 4. It completes instantly without any data leaving your device.
This verification works because the WASM binary module and all required page assets are cached in your browser's service worker cache on the first visit. Subsequent operations—including the complex mixed-size scaling transforms described in this article—execute against the locally cached module. If you run this same test on a cloud-based PDF merger, the tool will fail or show an error, confirming that it requires network connectivity to upload your files to a remote server.
8. Step-by-Step Guide: Merging PDFs with Different Page Sizes Using MojoDocs
Here is a practical, step-by-step walkthrough for merging a mixed-size document package using the MojoDocs PDF Merger. We will use the home loan KYC package (Aadhaar A4 + PAN custom size + bank statements A4) as our working example.
Step 1: Prepare Your Source Files
Collect all your source PDFs in one folder. For our example: aadhaar_ekyc.pdf (A4), epan_download.pdf (custom compact size), and bank_statements_6m.pdf (A4, 20 pages). If you have physical documents, scan them at 150 DPI in grayscale using your phone's document scanner app (Microsoft Lens, Adobe Scan, or the built-in iOS scanner in Notes). Scanning at 150 DPI gives you clean text without producing oversized files.
Pro Tip: If you do not have a scanner, many urban areas now have quick-commerce services like Blinkit or Zepto that partner with nearby print stores for document scanning. For ₹20–₹50, a courier can pick up your physical documents, scan them, and deliver the digital files back to your inbox. Once you receive the PDFs, merge them locally using MojoDocs—never upload them to the scanning service's own cloud portal for merging.
Step 2: Open the MojoDocs PDF Merger
Navigate to the MojoDocs PDF Merger in your browser. The page loads the WASM processing module from the browser cache in under two seconds on a standard broadband connection. On subsequent visits, it loads instantly from cache with no network dependency.
Step 3: Add Your PDF Files
Drag all three source files into the file drop zone, or click the "Select Files" button to open your system's file picker. MojoDocs will immediately parse the metadata of each file locally and display them as cards in the workspace, showing page count and detected page size for each document.
Step 4: Review the Page Size Summary
After uploading, the tool shows you a page-size breakdown. In our example, you will see that the Aadhaar and bank statement files are A4, while the ePAN file has a compact custom size. This confirmation tells you that size normalisation is required before the documents can be meaningfully combined.
Step 5: Select the Layout Normalisation Strategy
In the settings panel, choose "Normalise all pages to A4" as your layout strategy. This is the correct choice for Indian portal submissions. The WASM engine will compute the appropriate affine transformation matrix for the PAN card page, scaling it up proportionally and centering it within an A4 canvas.
Step 6: Arrange the File Order
Drag the file cards into the order required by the bank's KYC checklist—typically: (1) Aadhaar, (2) PAN, (3) Bank Statements. The tool compiles pages in top-to-bottom card order. You can also expand individual files to reorder their internal pages if needed.
Step 7: Run the Merge
Click the "Merge PDFs" button. The WASM engine executes the full pipeline—re-indexing objects, applying the scaling transformation to the PAN page, rebuilding the page tree, and serialising the output—in a background Web Worker thread. This keeps the browser interface responsive. For a 22-page package, the operation completes in under three seconds on a modern device.
Step 8: Download and Verify
Click the download button to save the merged PDF. Open it in your default PDF viewer and verify: all pages should be A4 portrait, the PAN card content should be centred and clearly readable, and the page count should match your expected total. You are now ready to upload this consolidated KYC package directly to the bank portal.
9. Advanced Considerations for Professional Workflows
9.1 Handling Rotated Pages from Scanner Trays
Flatbed scanners fed from the short edge produce landscape-oriented pages with a /Rotate 90 or /Rotate 270 entry in their page dictionaries. These pages look correct in most PDF viewers (which apply the Rotate metadata to the rendering), but their raw MediaBox dimensions are transposed: a landscape A4 page will show [0 0 842 595] in the MediaBox even though it visually appears as portrait. A size-aware merge engine must read the Rotate entry before computing scale factors. The effective width and height for scaling purposes are:
- If Rotate = 0 or 180: effective_width = MediaBox[2], effective_height = MediaBox[3]
- If Rotate = 90 or 270: effective_width = MediaBox[3], effective_height = MediaBox[2]
Failing to account for this causes the scaling matrix to be computed against the transposed dimensions, producing pages that are scaled incorrectly and then rendered sideways—a very common bug in lower-quality merge tools.
9.2 Merging Colour-Space-Incompatible Pages
Some source PDFs use the CMYK colour space (typical of professionally printed brochures or commercial documents), while others use sRGB (typical of on-screen documents and web-generated PDFs). When merging these, the colour space declarations in each page's resource dictionary must be preserved as-is—the merge engine should not attempt to convert colour spaces, as this requires a full rasterisation step (destroying vector quality) and is outside the scope of a structural merge operation. PDF viewers are equipped to handle per-page colour space variation in a single document.
9.3 Preserving Accessibility Tags During Scaled Merges
Tagged PDFs (PDF/UA compliant documents) include a logical document structure tree (/MarkInfo, /StructTreeRoot) that maps visual content to semantic elements (headings, paragraphs, tables, list items). When the content stream is transformed with an affine matrix, the physical coordinates of the tagged content regions change. An advanced merge engine must update the bounding box attributes in the structure tree to match the post-transformation coordinates. Failing to do this breaks screen reader compatibility and may cause accessibility audit failures for documents submitted to public sector bodies that mandate PDF/UA compliance under the Rights of Persons with Disabilities Act (RPwD) 2016.
9.4 Compression Strategy for Mixed-Content Merged Documents
After merging, the file size may be larger than expected because: (1) decompressed and recompressed content streams may have suboptimal compression ratios compared to the originals, and (2) font deduplication may not fully offset the overhead of the transformation matrices. If the merged file needs to meet a strict size limit—such as the 2MB limit for Parivahan portal uploads or the 500KB limit for some NSDL form attachments—you should run the output through a separate compression pass. MojoDocs offers a PDF Compressor tool that applies aggressive but lossless stream compression (optimised Flate parameters) and optional image downsampling, reducing file sizes by 40–75% without destroying text layer quality.
10. Frequently Asked Questions about Merging PDFs with Different Page Sizes
-
What happens to the content if I scale a small PAN card page up to A4?
The text, lines, and vector elements on the PAN card are mathematical objects (PostScript path operators), not pixels. Scaling them up with an affine transformation matrix produces perfectly sharp, infinitely scalable output at any zoom level. There is no pixelation or blurring unless the PAN card was originally embedded as a raster image (JPEG scan), in which case the scan resolution of the source file sets the clarity limit.
-
Will the A4-normalised pages look identical to the original source documents?
For pages that were already A4, yes—no transformation is applied and they are reproduced identically. For non-A4 pages, the content is proportionally scaled (aspect ratio preserved) and centred. The visual layout is maintained exactly; only the relative size on the page changes. A bank officer reviewing the document will see all information clearly and in correct proportion.
-
Can I merge PDFs with landscape and portrait pages in the same document?
Yes. A PDF supports mixed portrait and landscape pages within a single document. Each page retains its own Rotate and MediaBox entries. If you are normalising sizes, the engine scales each page to the target canvas independently, respecting its rotation. Landscape pages normalised to A4 will produce A4 landscape pages (wider than tall) within the otherwise portrait document—this is valid PDF and renders correctly in all standard viewers.
-
Does the scaling transformation affect file size?
Yes, slightly. Decompressing, modifying, and recompressing a content stream with zlib typically produces a stream 5–15% larger than the original due to the addition of the matrix operator bytes and potential changes in the compression entropy. However, font deduplication across files usually more than offsets this increase in a multi-file merge, resulting in a net file size similar to or smaller than the sum of the source files.
-
Is there a risk of content being cropped when scaling up a small page to A4?
No. The scaling strategy uses a "fit" algorithm: the scale factor is the minimum of the horizontal and vertical scale ratios, ensuring that neither dimension exceeds the target canvas. Content is never clipped during a scale-up operation. In a scale-down operation (e.g., normalising Legal to A4), the same fit algorithm ensures all content fits within the A4 canvas with proportional white margins on the shorter dimension.
-
What should I do if the merged PDF is rejected by the government portal for being too large?
Run the merged output through MojoDocs' PDF Compressor before uploading. Scanned images embedded in PDFs are the primary driver of file size. The compressor can downsample images to 150 DPI (sufficient for text legibility in official documents) and apply optimised Flate compression, typically reducing total size by 50–80% while keeping all text perfectly readable for government officers reviewing the submission.
-
Can I verify that MojoDocs is not sending my documents to any server?
Absolutely. Use the Flight Mode Verification: open the tool, disconnect your internet, and run a merge. The operation completes locally. Alternatively, open your browser's Developer Tools (F12), go to the Network tab, clear the log, run a merge, and inspect the requests. You will see no HTTP POST or PUT requests with your file data—the only network activity after the initial page load is analytics pings (which contain no file data) or update checks for the WASM module version.
-
Does MojoDocs handle PDFs generated by Tally or other Indian accounting software?
Yes. PDFs exported from Tally ERP 9, TallyPrime, or any other accounting software conform to the ISO PDF specification and are handled identically by the WASM engine. Tally typically exports A4 portrait PDFs with embedded standard fonts, which merge cleanly with other A4 documents without any size conflict.
-
Can I merge more than 10 PDFs at once?
There is no hard limit enforced by MojoDocs. The practical limit depends on your device's available RAM. Each page consumes a modest amount of memory during the merge phase (a few kilobytes of object graph data plus the compressed content stream bytes). On a modern device with 8GB RAM, you can comfortably merge hundreds of pages in a single session. For very large batches (50+ files with many pages each), we recommend processing in sub-batches of 20–30 files and then merging the intermediate outputs.
-
Is the tool free for commercial use, such as processing client documents in a CA firm or law office?
Yes. MojoDocs is completely free with no usage restrictions for personal or commercial use. Because all processing happens on your own device, there are no per-document server costs for MojoDocs to recoup. This makes it suitable for high-volume professional environments without any licensing fees or per-document charges.
11. The Environmental and Efficiency Case for Local Processing
Beyond privacy and cost, there is a meaningful efficiency and environmental argument for client-side PDF processing. When you upload files to a cloud merger, the data travels from your device over your ISP's network to a CDN edge node, from there to a data centre load balancer, from there to a worker server, and back along the same path with the processed output. Each hop consumes network bandwidth, router CPU cycles, and data centre cooling energy.
A typical home loan KYC package of 10 pages weighs about 5MB after scanning. Uploading and downloading via a cloud service consumes approximately 10MB of bandwidth per merge operation. For an office processing 50 such packages per day, that is 500MB of daily bandwidth consumed purely for document merging. At a commercial fibre plan rate, this bandwidth has a small but non-zero monetary cost—and the associated carbon emission from data centre processing is real and measurable.
By processing locally with MojoDocs, this network traffic is eliminated entirely. The WASM module is 2–4MB and is cached after the first download. All subsequent merges—regardless of file size—produce zero network payload. For high-volume document processing environments (corporate offices, CA firms, legal practices), the aggregate bandwidth and energy savings over a year are substantial.
12. Conclusion: Own Your Documents, Own Their Geometry
The challenge of merging PDFs with different page sizes is not merely a user interface problem. It is a technical problem with roots in the PDF specification's per-page coordinate geometry model, the mathematics of affine transformation, and the engineering of content stream processing pipelines. Most consumer tools paper over this complexity by ignoring page sizes entirely (producing inconsistent layouts) or by requiring server-side processing (requiring uploads that compromise privacy).
MojoDocs addresses the problem at its technical root: a fully featured WebAssembly PDF processing engine that reads MediaBox and CropBox dimensions, computes size-aware affine transformation matrices, decompresses and re-encodes content streams with the correct transformation operators, deduplicates font resources, and rebuilds a standard-compliant page tree—all within your browser's local memory sandbox, with zero network transmission of your document data.
For Indian users managing a document ecosystem that spans UIDAI Aadhaar, NSDL PAN, MEA Passports, Parivahan DL/RC records, and a diverse range of institutional formats, this local-first approach is the only one that is simultaneously technically correct, privacy-preserving, and economically rational. Visit the MojoDocs PDF Merger to merge your mixed-size documents locally, free of charge, and with complete confidence that your files never leave your device.

