All posts
EngineeringApril 6, 20265 min read

How to extract schema from SQL Server, MongoDB, JSON, XML, or Parquet

Information_schema, sp_help, mongoexport, jsonschema, parquet-tools — the right command for every common data store, in two lines or fewer per format.

By Dawid Sibinski

Schema extraction is one of those things you do every six months and forget the right command for. Here are the commands and queries for the most common data stores, in two lines or fewer per format.

SQL Server

Every-table column dump:

SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, IS_NULLABLE FROM INFORMATION_SCHEMA.COLUMNS ORDER BY TABLE_SCHEMA, TABLE_NAME, ORDINAL_POSITION;

For one table including indexes and constraints: EXEC sp_help 'dbo.MyTable'. For full DDL of an entire database, SSMS → right-click database → Tasks → Generate Scripts is the easy GUI route; mssql-scripter is the CLI equivalent.

MySQL / MariaDB

SHOW CREATE TABLE my_table; for one. For all: query INFORMATION_SCHEMA.COLUMNS the same way as SQL Server. mysqldump --no-data db_name > schema.sql exports schema-only DDL.

PostgreSQL

psql interactive: \d table_name. From SQL: query information_schema.columns. Full schema dump: pg_dump --schema-only db_name > schema.sql.

MongoDB

Mongo doesn't enforce schemas, so "extract schema" means inferring one from sampled documents. mongoexport gives you the docs; tools like variety.js or schema-inferrer build the inferred schema:

// in mongo shell load("variety.js"); db.runCommand({ eval: "variety(db.users, {limit: 1000})" });

Returns each field path with the types it appears as and how often. Crucial for collections that have evolved over time and have inconsistent shapes per document.

JSON

Infer a JSON schema from sample data with the genson library (Python):

from genson import SchemaBuilder b = SchemaBuilder() for doc in samples: b.add_object(doc) print(b.to_schema())

Output is a draft-07 JSON schema. quicktype (CLI) is the JS-ecosystem equivalent and outputs typed code (TypeScript types, Go structs, Rust enums) directly.

XML

If an XSD exists, fetch it from the schemaLocation attribute on the root element. If it doesn't, infer one with xsdgen (Python) or with Trang (Java, also handles RNG and DTD).

Parquet

parquet-tools schema file.parquet from the CLI. From Python:

import pyarrow.parquet as pq pq.read_schema("file.parquet")

Returns nested types (list, struct, map) preserved. For a flat summary, .to_pandas().dtypes works for most analyst needs.

GraphQL

Endpoints with introspection enabled return their full schema via the standard introspection query. apollo client:download-schema or get-graphql-schema (npm) wrap it. For introspection-disabled endpoints, schemas have to be reverse-engineered from observed queries — much harder.

What "schema" means in each

Worth noting: schema means slightly different things across these. SQL gives you fixed columns and types. JSON gives you a probabilistic shape. MongoDB gives you observed shapes. GraphQL gives you a typed contract. Pick the right tool for what "schema" means in your case before reaching for a one-size-fits-all extractor.

More on engineering

Stop reading, start extracting

Drop a PDF or image into ExtractFox and get structured data back in seconds.

Try a free extraction →