One of the things we have to deal with in note-based version identification is that two versions may be transposed in pitch; that is, they are in different scales, and all of the notes have been shifted up or down by a constant number of halfsteps. Since we're using chroma features to summarize the notes, we should explore the effect that this has on the chroma features.
The example below is not a different version per se, but it's a perfect example of transposition. In the song "Love on Top," Beyonce transposes the chorus 5 times, moving up by a single halfstep each time. We see that the chromas look very similar, except they have been shifted up by one row
import numpy as np
import matplotlib.pyplot as plt
import librosa
import librosa.display
import IPython.display as ipd
import warnings
warnings.filterwarnings("ignore")
c = 1
plt.figure(figsize=(12, 4))
y, sr = librosa.load("{}.mp3".format(c))
C = librosa.feature.chroma_cqt(y=y, sr=sr)
librosa.display.specshow(C, y_axis='chroma', x_axis='time')
plt.title('{}'.format(c))
plt.colorbar()
ipd.Audio(y, rate=sr)
c = 2
plt.figure(figsize=(12, 4))
y, sr = librosa.load("{}.mp3".format(c))
C = librosa.feature.chroma_cqt(y=y, sr=sr)
librosa.display.specshow(C, y_axis='chroma', x_axis='time')
plt.title('{}'.format(c))
plt.colorbar()
ipd.Audio(y, rate=sr)
c = 3
plt.figure(figsize=(12, 4))
y, sr = librosa.load("{}.mp3".format(c))
C = librosa.feature.chroma_cqt(y=y, sr=sr)
librosa.display.specshow(C, y_axis='chroma', x_axis='time')
plt.title('{}'.format(c))
plt.colorbar()
ipd.Audio(y, rate=sr)
c = 4
plt.figure(figsize=(12, 4))
y, sr = librosa.load("{}.mp3".format(c))
C = librosa.feature.chroma_cqt(y=y, sr=sr)
librosa.display.specshow(C, y_axis='chroma', x_axis='time')
plt.title('{}'.format(c))
plt.colorbar()
ipd.Audio(y, rate=sr)
This is the last actual shift that Beyonce does
c = 5
plt.figure(figsize=(12, 4))
y, sr = librosa.load("{}.mp3".format(c))
C = librosa.feature.chroma_cqt(y=y, sr=sr)
librosa.display.specshow(C, y_axis='chroma', x_axis='time')
plt.title('{}'.format(c))
plt.colorbar()
ipd.Audio(y, rate=sr)
Below I synthesized a few more shifts to show more clearly how this leads to a circular shift of the chroma features, since all octaves of a note are collapsed into a single equivalence class for that note
c = 6
plt.figure(figsize=(12, 4))
y, sr = librosa.load("{}.mp3".format(c))
C = librosa.feature.chroma_cqt(y=y, sr=sr)
librosa.display.specshow(C, y_axis='chroma', x_axis='time')
plt.title('{}'.format(c))
plt.colorbar()
ipd.Audio(y, rate=sr)
c = 7
plt.figure(figsize=(12, 4))
y, sr = librosa.load("{}.mp3".format(c))
C = librosa.feature.chroma_cqt(y=y, sr=sr)
librosa.display.specshow(C, y_axis='chroma', x_axis='time')
plt.title('{}'.format(c))
plt.colorbar()
ipd.Audio(y, rate=sr)
c = 8
plt.figure(figsize=(12, 4))
y, sr = librosa.load("{}.mp3".format(c))
C = librosa.feature.chroma_cqt(y=y, sr=sr)
librosa.display.specshow(C, y_axis='chroma', x_axis='time')
plt.title('{}'.format(c))
plt.colorbar()
ipd.Audio(y, rate=sr)