Pulling out IDs from objects and back also -- Part II

2023-12-29 jq

The last post about JQ was about converting data in object keys to attributes and back again. Because objects usually have mulitple keys this also implies conversion between one object and an array of objects.

This time the data is readily available as attributes and can be addressed with a simple .attr expression. Also the (initial) focus is a single object. But even then there are some nifty things to be explained.

One Object

Given this example input:

{
	"id": "foo",
	"data": {
		"x1": 42,
		"x2": 127
	}
}

the expected result is:

{
	"id": "foo",
	"x1": 42,
	"x2": 127
}

The most direct way to do this is this filter:

.data.id = .id | .data

although the id attribute is added at the end – which is semantically the same JSON, but might occassionaly annoy the human reader (like me). But there’s an easy fix for this:

{id} + .data

And it’s shorter too!

If the key id should be renamed in the lower object, then the syntactic abbreviation {id} must be expanded like this:

{new_id: .id} + .data

So merging objects using + allows me to control the order of the keys.

The reverse operation is this:

{ id, data: del(.id) }

This time there are no more tricks up the sleeve.

With Arrays

If data is not an object itself but an array of objects then pushing id down is also easily possible. Given this input

{
    "id": "foo",
    "data": [
        {
            "x1": 42,
            "x2": 127
        },
        {
            "x1": 123,
            "x2": 456
        }
    ]
}

the expected output is this:

[
  {
    "id": "foo",
    "x1": 42,
    "x2": 127
  },
  {
    "id": "foo",
    "x1": 123,
    "x2": 456
  }
]

This can be done easily using this filter:

[ {id} + .data[] ]

At first glace the reverse operation is also easy:

{
    id: .[0].id,
    data: map(del(.id))
}

Indeed this reverses the output given above to its original input. But clearly it works only correctly when all id values are identical.

But if the id values are not identical?

So given this input

[
  {
    "id": "foo",
    "x1": 42
  },
  {
    "id": "foo",
    "y1": 123
  },
  {
    "id": "bar",
    "z1": 999
  }
]

the expected outut is this:

[
  {
    "id": "bar",
    "data": [
      {
        "z1": 999
      }
    ]
  },
  {
    "id": "foo",
    "data": [
      {
        "x1": 42
      },
      {
        "y1": 123
      }
    ]
  }
]

A simple solution is to group the input array by .id and apply the above filter to each group like this:

group_by(.id) | map({id: .[0].id, data: map(del(.id))})

There is a very similar filter in the last post, which could be optimized into one reduce filter. That’s not the case here! The difference is that the reduce filter produces an object where the keys themself contain the data and these keys are also used by reduce for grouping the data. But the output format in this case requires additional postprosessing anyways – so I think the group_by|map solution is the best solution here.

Note that I present this solution ony for completeness sake because I see no practical use for this output format. Either I’d use the output of pullout_groups_by where I can access the data directly with a key. In this case the additional data container is just noise. Or I’d use the output of group_by(.id) directly.

Summary

Using the + expression avoids the clumsy “assign and extract value” idiom. Controlling the order of the attributes is a nice extra on top of that.