iTunes Wrapped: Elasticsearch + XML + Claude MCP

9 minute read

Back in the Day We Had Music Collections…

Repo here for the impatient…

Remember when you used to download your music, or even rip albums from CDs and “borrow” USB sticks? Well like everyone I went through that heady time. I went a bit further than perhaps most - I also spent several years meticulously rating my music in iTunes which if you can believe, included a star rating system.

These days are long since gone. Like everyone else I just mash buttons in Spotify and let the algorithm guide my listening. The result of this is that when Spotify Wrapped comes around each year, my analytics are a little underwhelming - they’re all some version of “here’s how you did or didn’t listen to music etc.”

Golden age iTunes was different - instead of telling you what to listen to, it baffled you with complex configurable browser panes and algorithmic auto-playlists:

auto-playlist

What is known by very few people indeed: to this day, Apple’s ‘Music’ FKA iTunes has a file called iTunes Music Library.xml which acts as a thorough manifest for your whole library:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>Major Version</key><integer>1</integer>
	<key>Minor Version</key><integer>1</integer>
	<key>Application Version</key><string>12.9.5.5</string>
	<key>Date</key><date>2020-03-03T20:46:48Z</date>
	<key>Features</key><integer>5</integer>
	<key>Show Content Ratings</key><true/>
	<key>Library Persistent ID</key><string>A7DD0EF2007C7820</string>
	<key>Tracks</key>
	<dict>
		<key>2534</key>
		<dict>
			<key>Track ID</key><integer>2534</integer>
			<key>Size</key><integer>4506241</integer>
			<key>Total Time</key><integer>295653</integer>
			<key>Track Number</key><integer>2</integer>
			<key>Year</key><integer>2004</integer>
			<key>Date Modified</key><date>2007-04-20T10:33:48Z</date>
			<key>Date Added</key><date>2011-11-30T11:38:10Z</date>
			<key>Bit Rate</key><integer>144</integer>
			<key>Sample Rate</key><integer>22050</integer>
			<key>Skip Count</key><integer>1</integer>
			<key>Skip Date</key><date>2015-09-22T11:50:47Z</date>
			<key>Persistent ID</key><string>754426F55BBBC080</string>
			<key>Track Type</key><string>File</string>
			<key>File Folder Count</key><integer>5</integer>
			<key>Library Folder Count</key><integer>1</integer>
			<key>Name</key><string>Install A Beak In The Heart That Clucks Time In Arabic</string>
			<key>Artist</key><string>65daysofstatic</string>
			<key>Album Artist</key><string>65daysofstatic</string>
			<key>Album</key><string>The Fall Of Math</string>
			<key>Genre</key><string>Other</string>
			<key>Kind</key><string>MPEG audio file</string>
			<key>Sort Album</key><string>Fall Of Math</string>
			<key>Location</key><string>file:///Users/roryscott/Music/iTunes/iTunes%20Media/Music/65daysofstatic/The%20Fall%20Of%20Math/02%20Install%20A%20Beak%20In%20The%20Heart%20That%20Clucks%20Time%20In%20Arabic.mp3</string>
		</dict>
		<!-- Many more tracks -->
  <!-- Albums, Playlists, etc. -->
</xml>

I was feeling nostalgic for these days, so I decided to see how far I could go towards ‘Spotify Wrapped’ but for iTunes, and the answer is pretty far.

Dashboard and Graphs

For the traditional dashboard stuff - graphs, etc. - I opted for Elasticsearch + Kibana:

dashboard

The dashboard automatically processes the whole XML file and does some reorganising on the way in to create a few different indexes.

Qualitative Dialogue

For the more qualitative ‘you listen to x in y way’ stuff, I created a local Elastic MCP for Claude:

claude-mcp-dialogue

My Reflections

I’m pretty astounded by how quickly I could get this up and running - I used Claude Code extensively and it was unbelievably helpful, particularly for the Docker orchestration and creating the search indexes. I go into more implementation detail below for anyone curious.

And in terms of results - I’m really impressed, particularly with how fast and performant Claude+MCP is in doing analytical work on the Elastic instance. Check out this dialogue for example:

> **me**
> Have a deep look at the structure and available data within elasticsearch and think about some interesting insights which I might not have considered

> **claude+MCP** 
> I'll take a deep dive into your Elasticsearch data structure to uncover some interesting insights. Let me start by examining all the available indices and their contents.
> 
> I see four main indices related to your music library: tracks (15,468 documents), albums (1,345 documents), artists (812 documents), and genres (158 documents). Let me examine each of these in more detail.
> 
> First, let's look at the tracks index structure:
> Now let's check the albums index:
> Let's check the artists index:
> And finally, the genres index:
> Now that I understand the data structure, let me explore some potentially interesting insights:
> 
> 1. First, let's look at your most played genres:
> 2. Let's look at your listening habits over time - when do you add new music?
> 3. Let's look at your listening habits throughout the day:
> Let's try with a more specific query that might work better:
> 4. Let's look at your top rated artists:
> 5. Let's discover your most "skipped" tracks relative to play count:
> Let's try another approach to get frequently skipped tracks:
> 6. Let's look at patterns in your listening behaviour - tracks with high play counts but not rated:
> 7. Let's look at seasonal listening patterns - what's your summer music?
> 8. Let's look at your highest rated but least played tracks (hidden gems):
> 
> Based on my exploration of your Elasticsearch data, here are some interesting insights you might not have considered:
> 
> ### 1. Genre Listening Patterns
> While Hip-Hop (8,452 plays) and Jazz (8,108 plays) are your most played genres overall, Comedy has the highest average play count per track (8.37). This suggests you repeatedly return to your favourite comedy tracks more than any other genre, even though you have far fewer comedy tracks (155) compared to Jazz (2,081) or Hip-Hop (1,706).
> 
> ### 2. Listening Growth Timeline
> Your music collection shows significant growth spurts in specific years. 2011 saw the addition of 4,172 tracks, and 2015 saw 3,550 new tracks. These years might represent periods where you discovered new music services, expanded your taste, or migrated collections from other platforms.
> 
> ### 3. Hidden Treasures
> You have 1,639 tracks rated highly (80+ rating) but played 5 times or less. Many of these are by John McLaughlin, suggesting you're a fan of his work but haven't fully explored his catalog. These represent "hidden gems" in your collection that you've acknowledged as quality music but haven't spent much time with.
> 
> ### 4. Love-Hate Relationship Tracks
> Interestingly, some of your most-played tracks are also frequently skipped. "Indigo" by Al Di Meola, Jean-Luc Ponty, & Stanley Clarke appears in both your top 10 most played (100 plays) and most skipped (14 skips) lists. This suggests a complex relationship with certain tracks – perhaps you love portions of them but find other sections less appealing.
> 
> ### 5. Rating Consistency
> All your top-rated artists have a perfect 100 rating, but their play counts vary dramatically. For example, David Rees-Williams has 295 plays across 11 tracks, while Cinematic Orchestra has only 10 plays across 62 tracks. This suggests you might rate artists highly based on reputation or a few standout tracks rather than consistent listening.
> 
> ### 6. Collection Composition
> Your track collection (15,468 tracks) contains a significant number of podcasts (1,414 tracks), which function differently from music tracks in terms of replay value. This might skew some of your listening statistics, as podcasts typically have lower replay rates than music.
> 
> ### 7. Album Completion
> The presence of 119 deleted documents in your albums index might indicate incomplete collections where you've removed certain tracks from albums, potentially suggesting you're selective about keeping only favourite tracks from some albums.
> 
> ### 8. Detailed Tagging
> Your meticulous music library includes fields like BPM, explicit content flags, and detailed composer information, suggesting you value organized metadata. However, there are still 922 tracks tagged as "Unknown" genre, representing a portion of your collection that might benefit from better categorization.
> 
> Would you like me to explore any of these insights in more depth?

Approach

The entire setup runs in Docker containers, with Elasticsearch handling the data ingestion and Kibana providing the visualization layer. Nothing gets installed locally on my machine - everything is isolated in containers or just files like the MCP server (which is accessed and run from clients rather than served).

graph TD
    User --> KB[Kibana Dashboard]
    User --> Claude[Claude Interacts with Data]
    
    Main[Python Main App] -- Setup --> KB
    
    Claude --> MCP[Elasticsearch MCP Server]
    MCP --> ES[(Elasticsearch)]
    
    KB -- Visualize data --> ES
    
    subgraph Docker Container Environment
        ES
        KB
    end
    
    subgraph Data Flow
        XML[iTunes XML File] --> Main
        Main --> Tracks[Tracks Index]
        Tracks --> ES
        Main --> Artists[Artists Index]
        Artists --> ES
        Main --> Albums[Albums Index]
        Albums --> ES
        Main --> Genres[Genres Index]
        Genres --> ES
    end

The containerization approach means I can easily share this with friends or deploy it anywhere Docker runs. They just drop in their own iTunes XML export, run a single command from the Makefile, and get their own personalized music analytics dashboard.

Python + Elasticsearch

Claude helped me scaffold the whole system, from the Docker Compose setup to the Python code that parses the XML and feeds it into Elasticsearch. The clean separation between components makes everything modular and easy to understand.

The most impressive part was seeing how Claude + Elasticsearch handles the complex XML structure without much issue. It was really straightforward to generate quite a complex index which allows for fun queries, like this one which shows the top genres and their play count statistics:

curl -X GET "http://localhost:9200/tracks/_search" -H "Content-Type: application/json" -d'
  {
    "size": 0,
    "aggs": {
      "genre_stats": {
        "terms": {
          "field": "genre.keyword",
          "size": 5
        },
        "aggs": {
          "total_plays": {"sum": {"field": "play_count"}},
          "avg_plays": {"avg": {"field": "play_count"}},
          "avg_time_ms": {"avg": {"field": "total_time"}},
          "avg_time_min": {
            "bucket_script": {
              "buckets_path": {"avgTime": "avg_time_ms"},
              "script": "params.avgTime / 60000"
            }
          }
        }
      }
    }
  }' -u elastic:<my-password-here>

output:

{
  "took": 91,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "genre_stats": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 5636,
      "buckets": [
        {
          "key": "Jazz",
          "doc_count": 2081,
          "avg_plays": {
            "value": 3.8962037481979817
          },
          "avg_time_ms": {
            "value": 380857.4483421432
          },
          "total_plays": {
            "value": 8108.0
          },
          "avg_time_min": {
            "value": 6.34762413903572
          }
        },
        {many more}
      ]
    }
  }
}

Getting the dashboard to render with the correct time range was one of the few tricky parts. I needed the dashboard to default to showing all my music history (about 100 years worth), but Kibana’s default 15-minute window wasn’t cutting it. The solution was modifying the API calls to include custom time ranges at the dashboard level.

Model Completion Protocol (MCP) Integration

The implementation of this was… involved… it’s fair to say I went down a rabbit-hole but have emerged with an elegant solution I think. Quoting the readme…

To register the Elasticsearch MCP in your Claude desktop app and copy into a Claude Code instance, run sh scripts/create_local_elasticsearch_mcp.sh.

This will download a local copy of the Elasticsearch MCP server at version 1.0.0 and generate configuration JSON which can be used to install your new Elastic instance as a local MCP server.

The JSON looks like:

{
  "mcpServers": {
    "elasticsearch-local": {
      "command": "$UVX_PATH",
      "args": [
        "--directory",
        "$ELASTIC_MCP_PATH",
        "elasticsearch-mcp-server"
      ],
      "env": {
        "ELASTIC_HOST": "http://localhost:9200",
        "ELASTIC_USERNAME": "elastic",
        "ELASTIC_PASSWORD": "$PASSWORD"
      }
    }
  }
}

Once you’ve done the slightly annoying step of registering that in claude_desktop_config.json you can use it in Claude Desktop as seen above, or register it in Claude Code like this:

mcp-import-from-desktop.png

This makes it possible to use natural language to explore my music data, and in this instance, learn that I am significantly older than I thought:

query

The Build Process

The whole system follows a fairly clean workflow:

  1. The iTunes XML file gets parsed by Python (using the standard library)
  2. Data is transformed and loaded into Elasticsearch
  3. Kibana visualizations and dashboards are created via API calls
  4. Everything is accessible through a browser at http://localhost:5601

What I love about this approach is that there’s no manual setup - everything happens automatically when you run the setup script. Even the Kibana dashboard with all its visualizations gets created programmatically. This makes it incredibly easy to reset everything and start fresh if needed.

Next Steps

I want to try this on other peoples’ old iTunes XML files - I suspect it might fall over as I don’t know how standardised the files are over time, but if you have an old un-loved iTunes you paid a lot of attention to, please try to spin this up and play around with it, and maybe even put some issues in the repo if it doesn’t work!