Process Nested Jsons Using AWS Athena | KTree

Overview

This Article shows how to import a nested json like order and order details in to a flat table using AWS Athena . Athena is the most powerful tool that can scan millions of nested documents on S3 and transform it to flat structure if needed. Athena will automatically scale up the required CPU to process it without any human intervention.

In this article we will first take some sample nested JSON data structure which we will transform to flat structure.

Example Json Data Structure

The following is the nested json structure which can exhibit data of order and orderliness .
Note : Athena reads the file in the following format only.

Its converted to this Flat structure

Step 1: Upload File To S3

Once data file is ready you can import to any S3 bucket using s3 upload file option from AWS Console . Following is the screenshot of file uploaded to s3.

Step 2: Access Orders Data Using Athena

This step needs to be careful while creating Athena structure for the provided data file . Athena looks like a relational table structure but it will not store any data. It will directory query the file at run time and provide the result.

Create database in athena with following query like traditional sql query.

“Create database testme”

Once database got created , create a table which is going to read our json file in s3.

Step 3: Create Athena Table Structure for nested json along with the location of data stored in S3

Create table and access the file

Athena has good inbuilt support to read these kind of nested jsons. Following is the schema to read orders data file.

CREATE EXTERNAL TABLE `test_orders1`(
  `details` array<struct<orderno:int,detailno:int,cost:double,partnumber:string,linecode:string,qty:int>> COMMENT ‘from deserializer’, 
  `header` struct<orderno:int,orderby:string,amount:double,orderdate:string> COMMENT ‘from deserializer’)
ROW FORMAT SERDE 
  ‘org.openx.data.jsonserde.JsonSerDe’ 
STORED AS INPUTFORMAT 
  ‘org.apache.hadoop.mapred.TextInputFormat’ 
OUTPUTFORMAT 
  ‘org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat’
LOCATION
  ‘s3://test-bucket/load’
TBLPROPERTIES (
  ‘transient_lastDdlTime’=’1558949443’)

Explanation

Details:

is an array object contains 2 jsons in it. So represented same in the structure.
`details`

array<struct<orderno:int,detailno:int,cost:double,partnumber:string,linecode:string,qty:int>>

Header :

Schema for header info loading

  `header` struct<orderno:int,orderby:string,amount:double,orderdate:string>

Location:

‘s3://test-bucket/load’

Which is pointing to s3 bucket to access the file. We can not specify file directly here

If schema defined well table will create successfully.
Based on the schema the following is the table structure in Athena

Step 4: Preview Table Data

If You are table is created successfully in AWS Athena now we should be able to see the data from S3 data file which we provided to do that click on the menu (three dotted icon) click on preview table

You can see the data is displaying on results console. Now we have to view the results in a flat view so follow the below steps to access inner objects/fields.

Step 5: Now Let’s show our data in two columns header and detail

Here “item” means nested array contains order details . In order to split arrays in to rows we need to use CROSS JOIN in conjunction with unnest operator.

Data Format after first level transformation:

Use the following query to access

SELECT

  header,

  item

FROM

  (test_orders1

CROSS JOIN UNNEST("details") t (item))

Explanation:

Header — header json object from Json File.
Item — Item object is refers to details array in json record.

5b. Finally transform header and details objects as flat columns.

SELECT

  header.orderno,

  header.orderby,

  header.amount,

  header.orderdate,

  item.partnumber,

  item.qty,

  item.cost

FROM

  (test_orders1

CROSS JOIN UNNEST("details") t (item))

Output:

Now each orderline represented in flat view.

5c. If you want to get the total cost of the order by partnumber we can use the below query.

SELECT

  header.orderno,

  header.orderby,

  header.orderdate,

  item.partnumber,

  (item.qty*item.cost) as totalcost

FROM

  (test_orders1

CROSS JOIN UNNEST("details") t (item))

Export to CSV

Once result set is ready we can export to a csv file.

Csv file;

Athena has flexibility to export big result sets to a csv file.

Creating Sequences Dynamically for documents, When a Company Is Created in Odoo 18

In Odoo, sequences are used to generate unique identifiers for various documents such as orders, invoices, and quotations. While Odoo allows you to manually configure sequences, it’s often necessary to create default sequences dynamically, especially when dealing with...

Multithreading in Odoo18: Running Background Tasks Efficiently

Introduction Odoo operates in a single-threaded environment, meaning that all processes execute sequentially. However, some operations, such as long-running computations, API calls, and large data imports, can block the system and slow down user interactions....

Creating Custom Pages in Odoo 18 Using Controllers and XML

Introduction Odoo provides a powerful framework for creating custom web pages through a combination of controllers (Python) and templates (XML). This article demonstrates how to create a custom web form in Odoo 18 using a real-world example: a Real Estate Channel...

Preventing Delivery Creation in Odoo18 Sale Orders Using Context

Understanding the Default Behavior: In Odoo, the default behavior upon confirming a sale order is the automatic creation of a delivery order (picking) for the associated products. This process is facilitated by the stock rules mechanism, and the delivery is triggered...

Odoo 18 OWL Framework Tutorial: Building a Product Color Selector Beginners Guide

OWL (Odoo Web Library) is Odoo’s modern web framework for building user interface components. It’s a lightweight, component-based framework that provides:

Bulk Updating Sale Order Status Based on Customer in Odoo 17 ERP

In Odoo, we can enhance sales order management by adding a menu that allows bulk updating of sale order states based on the selected customer. This feature simplifies order processing by fetching all orders/ quotations for a customer and enabling bulk confirmation of orders. This also can be used for any other action that you want to perform in bulk for Sales orders.

KTree offers Open Source Development and consulting services in numerous countries including the USA, UK, Australia, Canada, India, Europe, and the Middle East (Kuwait, Qatar, Russia, France, Italy, Spain, Cyprus, Turkey, UAE, Dubai, Saudi Arabia), as well as the Philippines, Netherlands, Brazil, Indonesia, Malaysia, Singapore, Japan, China, Luxembourg, BeNeLux, Belgium, Ireland, Iceland, and Switzerland.