โ06-05-2024 06:57 AM
I have an API which i call on a schedule, and it returns me all status updates since the last poll.
Due to API limits etc... I do this on a 15 minute interval.
However, if the item gets updated more than once within this interval, I get the same item returned an equal amount of times, but with different values.
The return I get, after flattening the result and putting it into a list is something like this:
Order | Status | Arrival Timestamp | Departure Timestamp |
ORD-112200 | PLANNED | NULL | NULL |
ORD-112200 | BOOKED | NULL | NULL |
ORD-112200 | BOOKED | 2024-06-05 10:00:00 | NULL |
ORD-112200 | BOOKED | 2024-06-05 10:00:00 | 2024-06-05 11:00:00 |
ORD-112201 | PLANNED | NULL | NULL |
I would like to upsert this into a table in a SQL database, but can't do this in batch as I have multiple instances of the unique column identifier, which is "Order".
So what needs to be done is either merge them in a list, or add something like a rownumber for each order, and then only select the item with the highest rownumber per Order.
I would preferably not want to loop over each item in the list and upsert it into the database, which would have the same result, but would consume a LOT of tasks.
Does anyone have a good solutions for this?
Maybe I could use a dictionary key for each order, or something like that, or SQL collections?
Solved! Go to Solution.
โ06-05-2024 09:01 AM
I would dump the API response into a SQL Collection, then query and group by Order ID. The query response will only include a single result per Order ID, and the rest of the fields will include the data from the first instance of the Order ID found in the collection. I think this works for you, because I assume 1) you always want the newest update for that order ID and 2) the API response is already ordered from the newest to oldest update per order ID. Either way, I would test it out to make sure.
Here's the SQL query:
SELECT * FROM [table data pill] GROUP BY [order ID data pill]
If the above doesn't work, I would dump the API response into a Ruby action and write a quick loop to logically dedupe the list.
Hope this helps!
โ06-05-2024 09:01 AM
I would dump the API response into a SQL Collection, then query and group by Order ID. The query response will only include a single result per Order ID, and the rest of the fields will include the data from the first instance of the Order ID found in the collection. I think this works for you, because I assume 1) you always want the newest update for that order ID and 2) the API response is already ordered from the newest to oldest update per order ID. Either way, I would test it out to make sure.
Here's the SQL query:
SELECT * FROM [table data pill] GROUP BY [order ID data pill]
If the above doesn't work, I would dump the API response into a Ruby action and write a quick loop to logically dedupe the list.
Hope this helps!
โ06-05-2024 09:05 AM
Thanks, I will be trying this!
The only possible issue I'm seeing is that in my response there is a nested list, which I now flatten using .pluck('value')[0] and 1 and 2
But this is a good starting point, I'm trying it tomorrow!
โ06-05-2024 09:51 AM
It should work with your flattened list, but if not we can figure it out. Good luck!
โ06-06-2024 12:13 AM
It worked.
I tried the sql collection yesterday, but the output was weird.
After querying it was OK though.
i did a select * from bookings AS T0 WHERE index = (select max(index) from bookings where transport_number = T0.transport_number)
i added an index in the flattened list, as they are sorted from oldest to newest.
Thanks for the support!