Skip to content

Type Definitions

Complete reference for all TypeScript interfaces and types exported by undms.

Input Types

Document

Input document interface for extraction.

ts
interface Document {
  name: string;
  size: number;
  type: string;
  lastModified: number;
  webkitRelativePath: string;
  buffer: Buffer;
}

Properties

PropertyTypeDescription
namestringFile name
sizenumberFile size in bytes
typestringMIME type (e.g., text/plain, application/pdf)
lastModifiednumberLast modified timestamp (Unix epoch ms)
webkitRelativePathstringFile path for web compatibility
bufferBufferFile content as a Buffer

Example

ts
const document: Document = {
  name: 'report.pdf',
  size: 1024,
  type: 'application/pdf',
  lastModified: Date.now(),
  webkitRelativePath: '/documents/report.pdf',
  buffer: Buffer.from(pdfData),
};

Output Types

DocumentMetadata

Metadata result for extracted documents.

ts
interface DocumentMetadata {
  name: string;
  size: number;
  processingTime: number;
  encoding: string;
  content: string;
  metadata?: MetadataPayload;
  error?: string;
}

Properties

PropertyTypeDescription
namestringOriginal file name
sizenumberFile size in bytes
processingTimenumberExtraction time in milliseconds
encodingstringDetected encoding or MIME type
contentstringExtracted text content
metadataMetadataPayloadFormat-specific metadata (optional)
errorstringError message if extraction failed (optional)

DocumentMetadataWithSimilarity

Document metadata with similarity results.

ts
interface DocumentMetadataWithSimilarity {
  name: string;
  size: number;
  processingTime: number;
  encoding: string;
  content: string;
  metadata?: MetadataPayload;
  error?: string;
  similarityMatches: SimilarityMatch[];
}

Extends DocumentMetadata with similarityMatches.


GroupedDocuments

Documents grouped by MIME type.

ts
interface GroupedDocuments {
  mimeType: string;
  documents: DocumentMetadata[];
}

GroupedDocumentsWithSimilarity

Grouped documents with similarity data.

ts
interface GroupedDocumentsWithSimilarity {
  mimeType: string;
  documents: DocumentMetadataWithSimilarity[];
}

SimilarityMatch

Similarity comparison result.

ts
interface SimilarityMatch {
  referenceIndex: number;
  similarityPercentage: number;
}

Properties

PropertyTypeDescription
referenceIndexnumberIndex into the reference texts array
similarityPercentagenumberSimilarity score (0-100)

Metadata Types

MetadataPayload

Complete metadata payload with format-specific fields.

ts
interface MetadataPayload {
  text?: TextMetadata;
  docx?: DocxMetadata;
  xlsx?: XlsxMetadata;
  pdf?: PdfMetadata;
  image?: ImageMetadata;
}

Contains one or more format-specific metadata objects depending on the document type.


TextMetadata

Text content statistics.

ts
interface TextMetadata {
  lineCount: number;
  wordCount: number;
  characterCount: number;
  nonWhitespaceCharacterCount: number;
}

Properties

PropertyTypeDescription
lineCountnumberNumber of lines
wordCountnumberTotal word count
characterCountnumberTotal characters including whitespace
nonWhitespaceCharacterCountnumberCharacters without whitespace

DocxMetadata

DOCX-specific metadata.

ts
interface DocxMetadata {
  paragraphCount: number;
  tableCount: number;
  imageCount: number;
  hyperlinkCount: number;
}

Properties

PropertyTypeDescription
paragraphCountnumberTotal paragraphs
tableCountnumberNumber of tables
imageCountnumberEmbedded images
hyperlinkCountnumberHyperlinks in the document

XlsxMetadata

XLSX-specific metadata.

ts
interface XlsxMetadata {
  sheetCount: number;
  sheetNames: string[];
  rowCount: number;
  columnCount: number;
  cellCount: number;
}

Properties

PropertyTypeDescription
sheetCountnumberNumber of worksheets
sheetNamesstring[]Names of all sheets
rowCountnumberTotal rows across all sheets
columnCountnumberMaximum columns in any sheet
cellCountnumberTotal cells with content

PdfMetadata

PDF-specific metadata.

ts
interface PdfMetadata {
  title?: string;
  author?: string;
  subject?: string;
  producer?: string;
  pageSize?: PdfPageSize;
  pageCount: number;
}

Properties

PropertyTypeDescription
titlestringDocument title (optional)
authorstringDocument author (optional)
subjectstringDocument subject (optional)
producerstringPDF producer application
pageSizePdfPageSizeFirst page dimensions
pageCountnumberTotal number of pages

PdfPageSize

PDF page dimensions.

ts
interface PdfPageSize {
  width: number;
  height: number;
}

ImageMetadata

Image-specific metadata.

ts
interface ImageMetadata {
  width: number;
  height: number;
  format?: string;
  cameraMake?: string;
  cameraModel?: string;
  datetimeOriginal?: string;
  location: ImageLocation;
}

Properties

PropertyTypeDescription
widthnumberImage width in pixels
heightnumberImage height in pixels
formatstringImage format (JPEG, PNG, etc.)
cameraMakestringCamera manufacturer
cameraModelstringCamera model
datetimeOriginalstringDate/time when photo was taken
locationImageLocationGPS coordinates

ImageLocation

GPS coordinates from EXIF data.

ts
interface ImageLocation {
  latitude?: number;
  longitude?: number;
}

Type Aliases

SimilarityMethod

Valid similarity algorithms.

ts
type SimilarityMethod = 'jaccard' | 'ngram' | 'levenshtein' | 'hybrid';

Complete Usage Example

ts
import { extract, computeDocumentSimilarity, computeTextSimilarity } from 'undms';

const documents: Document[] = [
  {
    name: 'report.txt',
    size: 1024,
    type: 'text/plain',
    lastModified: Date.now(),
    webkitRelativePath: '',
    buffer: Buffer.from('Sample content'),
  },
];

// Extract
const extractResults = extract(documents);
const extractedDoc: DocumentMetadata = extractResults[0].documents[0];

// Access metadata
if (extractedDoc.metadata?.text) {
  const textMeta: TextMetadata = extractedDoc.metadata.text;
  console.log(textMeta.wordCount);
}

// Compute document similarity
const similarityResults = computeDocumentSimilarity(documents, ['reference text'], 50, 'hybrid');
const docWithSimilarity: DocumentMetadataWithSimilarity = similarityResults[0].documents[0];

docWithSimilarity.similarityMatches.forEach((match: SimilarityMatch) => {
  console.log(match.referenceIndex, match.similarityPercentage);
});

// Compute text similarity
const textMatches = computeTextSimilarity('source text', ['reference'], 30, 'hybrid');

Released under the MIT License.