JSON performance considerations in Oracle databases (best practices)

This article is a white paper“JSON in Oracle Database: Performance Considerations”translation and reading notes.

Purpose

This document outlines performance tuning best practices for JavaScript object notation (JSON) stored and processed in Oracle databases. Applying these best practices will enable developers, database administrators, and architects to proactively avoid performance issues and ensure that the applications and systems they design run at optimal performance.
The hyperlinks in this document provide access to documentation, more information, examples, and free hands-on training.

Introduction to JSON in Oracle Database

In 2014, Oracle released Oracle 12.1.0.2, adding to all Oracle database versionsNative JSON support（This is doubtful, it should be counted after OSON is released.). Prior to this release, JSON was typically stored in NoSQL databases, which lacked the functionality and data consistency model, which forced developers to add additional code to ensure data integrity.

To make up for the lack of NoSQL, developers began using relational databases or other data storage technologies, such as running analytical queries. In 2014, native JSON support functions were added.Eliminate the need for these additional dedicated data storage technologies, significantly speeding up development by reducing integration efforts, simplifying deployment, reducing risks and costs。（Benefits of converged databases) In addition, using standardized SQL operators to store, process, and analyze JSON significantlyShorten the time to get started and reduce the skills required, and enables non-developers to easily handle JSON data.

Oracle databases provide native JSON support. JSON is compatible with all Oracle database features including options, Oracle management packages, frameworks, architectures, and security. JSON stored in Oracle databases can also benefit from the performance, scalability, availability, scalability, portability, and security of Oracle databases. (Utilizing Oracle's powerful capabilities is also a benefit of converging databases) Accessing JSON stored in an Oracle database is the same as accessing other database access methods, including OCI, .NET, and JDBC.

For more information about JSON in Oracle databases, see the JSON Developer Guide.

Performance characteristics and technologies – In-depth discussion

The following section describes the features discussed in the Workload section in more detail:

Store JSON data in Oracle database for best performance

JSON can be stored using columns with data types VARCHAR2, CLOB, BLOB, or JSON. No matter which type you use, you can manipulate JSON data like other types of data.

For Oracle 21c, it is recommended to use native JSON types optimized for queries and efficient (partial) updates. An IS JSON check constraint can be defined on the JSON column to enforce the correct JSON syntax, which can be disabled (rather than deleted) if the application can guarantee the correctness of JSON.
For Oracle 19c, it is recommended to use the native BLOB data type, which is also optimized for queries and efficient updates.
CLOBs are also supported, but should be avoided because CLOBS usually requires twice as much storage space (and disk reads) due to UCS2 encoding.
The VARCHAR2 field is also supported, and you can consider using the VARCHAR2 field if you are known to have the maximum size of the JSON document, or if you prefer the simplicity of VARCHAR2. The VARCHAR2 value can hold up to 32K bytes (the original text is 32, wrong).

Workload type and data access mode

Database workloads can be divided into operational workloads and analytical workloads. Operational workloads, also known as online transaction processing systems (OLTP), are transaction-oriented, have numerous users, and are designed to enable instant response; for example, bank’s automated teller machines (ATMs). The OLTP system supports all data operation types. Typical operations involve transactions that insert or update data using a minimum number of rows. The performance goals of OLTP systems are transaction speed, throughput, and database concurrency. In contrast, analytical workloads such as online analytics processing (OLAP), data warehouses, and data lakes are built for data analytics with fewer users and are designed to process large amounts of data. Typical operations include processing thousands or millions of rows of data using complex resource-intensive queries that join and aggregate data between multiple tables. The OLAP system is optimized for query.

Keyboard Retrieve JSON Documents (OLTP)

Your workload selects a single JSON document based on the relationship column (key) and stores the JSON data in the second (load) column (Can be considered as a value in key/value)middle. The primary key constraint on the key column forces the key value to be unique and creates an index for quick search. If the key is not random (for example, using a sequence or identifying a column), the index may become a hot spot in a highly transactional system, as concurrent/sequent insertion hits the same index block. Index based on key columnHash partitioning（Common ways to disperse hot spots) will distribute the insertion operation evenly to all partitions. The SODA and MongoDB collections automatically have primary key columns—key-based document lookup requires no further action.

Retrieve JSON Documents by Field Value (OLTP)

Here, select one or several documents by the field values in the JSON document. The path expression in the JSON_VALUE or JSON_EXISTS operator defines these values.If you reuse the same path expression, it is recommended to use a function-based index of JSON_VALUE. Indexing field values of interest, you can replace the full table scan of data retrieval with index lookup to ensure optimal performance.

While it is easy to index a single field in a JSON document, array indexing is more challenging. Function-based indexes cannot index array values (the function can only return one value per JSON data); in versions prior to Oracle 21c, materialized views can be used as an alternative: materialized views expand the array into a relational column with multiple row entries and then index it as a normal column. Oracle's comprehensive query rewriting framework automatically rewrites SQL statements for JSON documents for quick data retrieval using materialized views. In Oracle 21c, you can use the new multi-value indexing feature introduced in this release to create indexes natively for values in JSON arrays.

Retrieve JSON documents using full text search (OLTP, OLAP)

Some workloads only know the value of interest, and do not know the path expressions of fields in JSON documents, such as temporary queries to any document. Oracle provides JSON search indexes to improve the performance of such workloads. With JSON search index, the SQL/JSON operator JSON_TEXTCONTAINS allows the selection of rows based on text search conditions, including stemming and fuzzy search.

Extract JSON values for reporting or analysis (OLAP)

In reporting or analytical use cases, JSON data is mapped to the relational model for further processing using SQL. Commonly used SQL operations include joins (with other JSON or relational data), aggregation (summarization, average, window functions), or machine learning (categorization, prediction). SQL/JSON operatorJSON_TABLE allows mapping from JSON to relational models. Oracle database optimizes multiple JSON query operators as much as possible to a single JSON_TABLE statement (displayed in the query execution plan).

For high selectivity analysis (selecting only a small number of JSON documents based on field filtering conditions), you can use index optimization access. If there are many (but not all) rows accessed and the index selectivity is no longer sufficient, then partitioning the data should be considered to remove unrelated partitions from the query. In addition, when dealing with large data volumes, it is recommended to make full use of parallel execution. The SQL/JSON operator JSON_TABLE can be parallelized without any restrictions. (Three optimization methods: index, partition, parallelism）

Suppose you run the same JSON to relational database transformation repeatedly, such as daily reports or dashboard queries. In this case, materializing the view can completely avoid repeated execution of the same JSON_TABLE transformation at runtime by materializing the intermediate results in the view. JSON_TABLE materialized views support quick refreshes, so they can be refreshed efficiently and automatically after insertion or update. Materialized views can also be used in conjunction with Oracle Database In-Memory to benefit from memory column compression and fast SIMD scanning. This can significantly improve performance, especially for analytical queries. (Materialized View (Main) + In-Memory (Super) Optimization）

JSON generation (OLTP, OLAP)

Oracle database has added SQL/JSON operator, (Maybe refer to JSON_ARRAY and JSON_OBJECT) is used to generate new JSON data from relational data and query results. A typical use case is to modify the structure of a JSON document, or return the results of the analysis query as a JSON data extraction. Indexes provide quick access when only a small number of rows are accessed. If JSON generation is based on multi-line data, materialized views should be considered, but it should be noted that the quick refresh feature generated by JSON is only supported in limited cases.

Performance characteristics and technologies – In-depth discussion

The following sections will introduce performance-related features in more detail, with examples attached. generally,Regular SQL tuning techniques are applicable: You can use your existing skills. This shortens the learning curve and eliminates concerns between DBA and database administrators about adopting JSON in Oracle databases. The main idea behind these tuning techniques is to reduce the amount of data that needs to be read and processed:

Function-based indexing

Function-based indexes can be created based on specific keys or key combinations and optimize query operations for the same key using the SQL/JSON operator. Function-based indexes are built using the JSON_VALUE operator and support bitmap and B-tree index formats.

The following example creates a (unique) function index on the PONumber key of the example JSON document, which is accessible via the path expression "$.PONumber". This example assumes that JSON data is stored in the "data" column of a table named "purchaseorder".

create unique index PO_NUMBER_IDX on PURCHASEORDER po(
json_value(, '$.PONumber' returning number
null on empty error on error));

The PONumber value will be extracted (and indexed) as a number. This affects range queries (sorted by numbers rather than alphabetical) and avoids data type conversions that do math or compare at runtime. Missing values will be indexed as SQL NULL values.

The following query uses simplified JSON syntax. Since the number() item method is used, the index will be used for data retrieval, as shown in the plan.

select data from PURCHASEORDER po
where () = 200;
-----------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------
| 0 | SELECT STATEMENT | | 
| 1 | TABLE ACCESS BY INDEX ROWID| PURCHASEORDER | 
|* 2 | INDEX UNIQUE SCAN | PO_NUMBER_IDX |
-----------------------------------------------------

Multi-value index

If the path expression can select multiple values, it is recommended to use multi-value indexing—which is common when accessing values in JSON arrays. The following code creates a multi-value index on the field "UPCode" in the JSON array "LineItems" of the example JSON document. These values are indexed as strings.

create multivalue index UPCCODE_INDEX on PURCHASEORDER po
( ());

Multi-value indexes also use B-tree, but because the generated ROWID needs to be deduplicated, it is slightly slower than the function index. Therefore, if a path expression is known to return at most one value, a function-based index should be preferred. Multi-value indexing was introduced in Oracle 21c (for earlier reasons, materialized views can be used to speed up access to arrays).

The following query uses multi-value indexes:

select data from PURCHASEORDER po where
() = '13131092705'; 
-------------------------------------------------------------
| Id | Operation 							| Name 			| 
|  0 | SELECT STATEMENT 					| 				| 
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED	| PURCHASEORDER | 
|* 2 | INDEX RANGE SCAN (MULTI VALUE) 		| UPCCODE_INDEX |
-------------------------------------------------------------

JSON Search Index

Oracle databases support the use of search indexes based on Oracle full-text indexes to index the entire JSON document. This search index contains not only all values, but also its field names, and allows full text searches. The following example creates a JSON search index on "purchaseorder".

create search index PO_FULL_IDX on PURCHASEORDER po () for json
parameters('SYNC (EVERY "FREQ=SECONDLY; INTERVAL=1") DATAGUIDE OFF');

The "parameters" clause specifies that the index is asynchronous and synchronized once per second. It is also possible to synchronize indexes every time a transaction commits, but this increases index maintenance costs and reduces the throughput of concurrent DML. JSON search indexes can also discover schema changes during DML operations through a feature called JSON Dataguide—for example, it allows automatic generation of JSON_Table views. The DATAGUIDE OFF clause disables this mode discovery feature, reducing the cost of JSON search indexes during DML operations.

The underlying data structure of JSON search index is a publish list, which is usually slower than a B-tree index. If JSON search indexes are used with function-based indexes or multi-value indexes, the optimizer will prioritize these indexes as much as possible. Since JSON search indexes index the entire JSON data, its size will be significantly larger than other indexes, usually around 20%-30% of the original data. JSON search index supports values in JSON arrays and full-text search operations. The following example selects documents that contain both the words "Magic" and "Christmas". In addition to "{and}", you can also use "{near}" or "{not(…)}". For more information on the JSON search indexing feature, see the documentation

select data from PURCHASEORDER po
where JSON_TEXTCONTAINS(, '$.', 'Magic
{and} Christmas');

The query execution plan displays the JSON search index as a "domain index":

---------------------------------------------------------
| Id  | Operation 						| Name			|
---------------------------------------------------------
| 	0 | SELECT STATEMENT 				| 				|
| 	1 | TABLE ACCESS BY INDEX ROWID		| PURCHASEORDER |
|* 	2 | DOMAIN INDEX 					| PO_FULL_IDX	|
---------------------------------------------------------

For workloads that contain a large number of DML operations, it may be helpful to use a single JSON search index to replace a large number of function indexes and multi-value indexes, thereby reducing index maintenance (index synchronization after DML operations). For more optimization strategies, please refer to the blog quoted on the right.

Materialized view

You can use materialized views to improve the performance of frequently accessing multiple rows of queries (rather than index-driven key-value lookups). The materialized view will persist the query results. Subsequent queries that partially or fully match materialized view queries can access materialized data without rerunning the original query (Trade space for speed）。

In this document, we focus mainly on JSON_TABLE materialized views. The following code creates a materialized view containing the values of the "LineItems" array in the example JSON document. As mentioned earlier, using materialized views can index JSON array values in Oracle 19c, because this version does not support multi-value JSON indexing.

create materialized view PO_MV build immediate
refresh fast on statement with primary key as
 select , jt.* from PURCHASEORDER po, json_table(, '$' error on error null on empty
 columns (
 	 po_number NUMBER PATH '$.PONumber', 
 	 userid VARCHAR2(10) PATH '$.User', 
 	 NESTED PATH '$.LineItems[*]'
 	 	columns ( itemno NUMBER PATH '$.ItemNumber',
 	 	description VARCHAR2(256) PATH '$.',
 	 	upc_code NUMBER PATH '$.',
 	 	quantity NUMBER PATH '$.Quantity',
 	 	unitprice NUMBER PATH '$.')))
jt;

Converting array values to multiple rows in our materialized view allows us to create additional (supportive) indexes on fields in a JSON array as shown below:

CREATE INDEX mv_idx ON PO_MV(upc_code, quantity);

SQL/JSON queries on the base table are now transparently rewritten to use materialized views and their indexes as much as possible. The following query is an example where Oracle automatically rewrites the query to use materialized views and their secondary indexes, as shown in the execution plan:

select data from PURCHASEORDER po
where JSON_EXISTS(, '$.LineItems[*]?(@. == 1234)');
-----------------------------------------------------------------
| Id | Operation							| Name 				|
 … 	
| 4 | MAT_VIEW ACCESS BY INDEX ROWID BATCHED| PO_MV 			|
|* 5 | INDEX RANGE SCAN 					| MV_IDX 			|
-----------------------------------------------------------------

To keep the materialized view (MV) synchronized with the underlying data (after the DML operation), we set the materialized view to "Sentence Quick Refresh". This automatically performs the refresh process and always keeps the materialized view and base table data consistent. This article will not discuss in depth the various refresh mechanisms of materialized views. For more details, please refer to the relevant documentation.

Oracle Partition

You can partition tables containing documents as usual for performance: Partitions can subdivide tables and indexes into separate smaller physical objects, so-called partitions. **The data location in the partition table is identified by the partition key. This key can be a relational column or a field in JSON data. **From the application point of view, partitioned tables are exactly the same as non-partitioned tables.

The following example creates a range partition table whose partition keys are extracted from the JSON document stored in the "data" column and uses the JSON_VALUE-based virtual column "po_num_vc";

CREATE TABLE part_j (id VARCHAR2 (32) NOT NULL PRIMARY KEY,
data JSON,
po_num_vc NUMBER GENERATED ALWAYS AS
(json_value (data, '$.PONumber' RETURNING NUMBER)))
PARTITION BY RANGE (po_num_vc)
	 (PARTITION p1 VALUES LESS THAN (1000), 
	 	PARTITION p2 VALUES LESS THAN (2000));

Queries that filter the JSON field "$.PONumber" (the JSON field used as the virtual column partition key) will significantly benefit from Oracle partitioning capabilities: an optimization technique called partition pruning automatically excludes all unrelated partitions, i.e. partitions known to contain no data related to the query.
The following example query only needs to access the first partition, because the query's equality predicate can only find matching records in this partition. This is shown in the execution plan, where both the Pstart column and the Pstop column are 1.

select data from part_j
where json_value (data, '$.PONumber' RETURNING NUMBER) = 500; -----------------------------------------------------------------
| Id 	| Operation 			| Name | Time | Pstart| Pstop |
-----------------------------------------------------------------
| 0 	| SELECT STATEMENT 		| 			| 00:00:01 | 	| 	|
| 1 	| PARTITION RANGE ALL	| 			| 00:00:01 | 1 	| 1 |
|* 2 	| TABLE ACCESS FULL 	| PART_J 	| 00:00:01 | 1 	| 1 |
-----------------------------------------------------------------

Oracle partitioning has multiple mechanisms to partition tables. Due to space reasons, I will not go into details here. For more details, please refer to the documentation. Typically, for larger JSON documents (average > 32kb), using a relational column as a partition key is generally more performant during DML operations than using a JSON_VALUE virtual column, because the latter requires the partition key to be extracted from JSON before writing to the correct partition.

Parallel execution

By using multiple processes to process JSON documents, JSON operations (such as queries or batch updates) can be parallelized. This can make more efficient use of hardware resources and is the key to large-scale data processing.

Large data warehouses should always use parallel execution for good performance。Specific operations in OLTP applications, such as batch operations, can also benefit significantly from parallel execution。
Parallel execution supports query and DML (insert, update). There are several ways to enable and configure parallel execution. For example, Oracle Autonomous Database automatically selects parallelism based on the consumer group selected for the connection. For databases that manually control parallelism, you can enable parallelism at the session level or decorate a single object. For example, the following code enables 8-level parallelism for our table "purchaseorder".

alter table PURCHASEORDER parallel 8;

If parallel execution is used, the execution plan will display the row with "PX".

| 1 | PX COORDINATOR || 2 | PX SEND QC (ORDER)|

Oracle Memory Column Storage

JSON data can be stored in in-memory column storage (IM column storage), thereby improving query performance. JSON values up to 32 KB can be loaded directly with other relational columns and processed in memory. Generally, not all values in JSON documents are related to analysis queries. In this case, memory is more efficiently utilized by simply moving the relevant JSON fields separately in memory: virtual columns or intermediate materialized views can be used.

The following example adds a virtual column to the table "purchaseorder" that extracts the "zipCode" field from the order address. The virtual column has been added and the table has memory processing enabled.

alter table PURCHASEORDER add (ZIP varchar2(4000) generated always as
(JSON_VALUE(data, '$.()')));
alter table PURCHASEORDER inmemory;

The following analysis example query calculates order quantity by zipCode and processes it with fast memory as shown in the execution plan.

select zip, count(1) from PURCHASEORDER group by zip ;
-----------------------------------------------------
| Id 	| Operation 				| Name 			|
-----------------------------------------------------
| 0 	| SELECT STATEMENT 			| 				|
| 1 	| HASH GROUP BY 			| 				|
| 2 	| TABLE ACCESS INMEMORY FULL| PURCHASEORDER |
-----------------------------------------------------

Oracle Exadata Database Cloud Server

Exadata Accelerated JSON Performance: Queries containing tables and index scans can be used toData search and retrieval process offloading to Exadata storage server. For JSON operators (for example, using JSON_VALUE or JSON_EXISTS in the WHERE clause of a query), this uninstall operation willAutomatic and transparent。Maximum 4KBThe JSON document can be offloaded to the Exadata storage server. Larger documents will be processed in the database.

The STORAGE term in the execution plan indicates that the uninstallation is completed:

-----------------------------------------------------
| Id | Operation 				| Name 			|
-----------------------------------------------------
|* 3 | TABLE ACCESS STORAGE FULL| PURCHASEORDER |
-----------------------------------------------------

Oracle Real Application Cluster

Oracle True Application Cluster (RAC) allows customers to run a single Oracle database on multiple servers to maximize availability and accessShared storageImplementationHorizontal scalability。

Using Oracle Real Application Clusters is transparent for processing JSON documents, and any SQL/JSON processing will automatically benefit.

Oracle Sharding

Oracle sharding is also a horizontal scaling technology, but unlike RAC, it adoptsNo sharing architecture. Sharding technology allows JSON documents to be extended to massive data and transactions, and supports data sovereignty. JSON documents are distributed to individual database table shards based on shard keys (can be relationship columns or JSON fields).

The processing of sharded JSON documents is transparent using Oracle sharding technology. For many operations, processing of shard documents is only done on a database with a specific shard, while cross-slice queries will transparently collect and aggregate the result data from all relevant shards.

The JSON performance tuning feature in Oracle database ends here. The following summarizes the performance-related topics of the JSON document storage API (MongoDB collection and SODA collection):

Performance Tips SODA Series

Oracle Database provides APIs that allow access to JSON data in collections: Oracle Database API for MongoDB and simple Oracle Document Access API -SODA. Conceptually, JSON collections store JSON data (called documents) in automatically generated tables (so they can also be accessed through SQL). SODA supports the same storage options as regular tables containing JSON data, and the same advice applies: Use BLOB on Oracle 19c and native JSON types on Oracle 21c.

Users typically use native language drivers (for example, SODA for Java or SODA for Python) to process JSON collections. SODA native language drivers generally provide higher throughput (operands per second) than REST drivers (REST version SODA).

It is recommended to configure the SODA driver as follows:

Enable SODA metadata cache

The SODA driver needs to understand the metadata (column name, type, etc.) of each JSON collection. Enabling metadata caching reduces the number of round trips with the database, thereby increasing latency and throughput.

Enable statement caching

Statement caches improve performance by cached reused executable statements (such as in loops or in repeated-call methods). For Java, statement caching is enabled using JDBC.

For load balancing systems: Turn off DNS cache

Load balancing allows SODA operations to be distributed to different nodes. If DNS caching is enabled, all connections may use the same node, resulting in load balancing failure. For Java, the following system properties should be set: =0

Database performance tuning techniques are also suitable for SODA: For example, SODA collections can be partitioned or sharded, and queries can be accelerated using indexes and/or materialized views. The SODA operation is automatically converted to an equivalent SQL operation: For example, a SODA query is converted to a SELECT statement using the JSON_EXISTS operator in the WHERE clause.

SQL operations can be retrieved from the v$sql database view or SQL operations can be retrieved by enabling logging directly in the SODA driver: In Java, use the standard logging package - it can be enabled for SODA as follows:

java -classpath "..." -=true -= <program>

'=true' Enable logging of SQL statements.
‘’ Defines the path to the configuration file that allows different logging levels: FINEST is the most detailed logging level.

This is the end of this article about JSON performance considerations (best practices) in Oracle database. For more information about JSON content in Oracle database, please search for my previous articles or continue browsing the related articles below. I hope you will support me in the future!

JSON performance considerations in Oracle databases (best practices)

Purpose

Introduction to JSON in Oracle Database

Performance characteristics and technologies – In-depth discussion

Store JSON data in Oracle database for best performance

Workload type and data access mode

Keyboard Retrieve JSON Documents (OLTP)

Retrieve JSON Documents by Field Value (OLTP)

Retrieve JSON documents using full text search (OLTP, OLAP)

Extract JSON values ​​for reporting or analysis (OLAP)

JSON generation (OLTP, OLAP)

Performance characteristics and technologies – In-depth discussion

Function-based indexing

Multi-value index

JSON Search Index

Materialized view

Oracle Partition

Parallel execution

Oracle Memory Column Storage

Oracle Exadata Database Cloud Server

Oracle Real Application Cluster

Oracle Sharding

Performance Tips SODA Series

More information – Link

Extract JSON values for reporting or analysis (OLAP)