I am heavily into Salt infrastructure management at the moment, and wish to leverage all available (community written)
formulas. Luckily, the SaltStack group maintains a collection of excellent formulas on their github page,
and they are great source for states, ideas, best practices, etc. So I started cloning them, first the ones that I really needed. Then I realized later on, that some I may
need in the near future so why not clone all of them and ensure I have a local copy of them for my development.
The pages have been updated fairly regularly, more and more people contributing now to the project, which is great however it started to become tedious to find new formulas
and I needed an automated solution to keep up to date with the changes.
Script to Clone SaltStack Formulas from GitHub
I could not find anything off-the-shelf, hence I had to come up with my own solution using python.
#!/usr/bin/env python""" Script to find github hosted SaltStack formulas and make a up-to-date local copies of them."""importurllib2importreimportosimportsubprocess__author__="Ivan Vari""__credits__=["Oliver Drake"]__license__="GPLv3"__version__=None__maintainer__="Ivan Vari"URL="https://github.com/saltstack-formulas"PATTERN='a href="(/saltstack.*)"\s'HOME='{0}/development/saltstack-formulas'.format(os.environ['HOME'])defmain():""" Main function and control flow. """# print strings with color# http://pythonhosted.org/ANSIColors-balises/ANSIColors.htmlcolorgrn="\033[01;32m{0}\033[00m"colorwht="\033[1;37m{0}\033[00m"# margin for aligned printingwidth=50# default loop controlslastpage=1page=1# until we reach lastpagewhilelastpage>=page:page_url=URL+'?page={0}'.format(page)connection=urllib2.Request(page_url)response=urllib2.urlopen(connection)# find all formula-references on the current pageformatchinre.finditer(PATTERN,response.read()):url='https://github.com'+match.group(1)# if 1st page, extract last page id and update loop controlsifpage==1and'?page='inurl:ref=re.findall('page=\d*',url)lastpage=int(max(ref).split('=')[1])# then ignore the page id completelyif'?page='notinurl:# extract the formula name from the URLformula=url.split('/')[-1]repocopy=os.path.join(HOME,formula)# default assumes cloningcmd='git clone {0} {1}'.format(url,repocopy)exec_dir='/'# if local copy found, change exec_dir and method to pullifos.path.exists(repocopy):cmd='git pull'.format(repocopy)exec_dir=repocopyprocess=subprocess.Popen(cmd.split(),stdout=subprocess.PIPE,stderr=subprocess.STDOUT,cwd=exec_dir)# default output definitionresult='[OK]'flag=''# override output definition based on git output messageforlineinprocess.stdout.readlines():if'Cloning'inline:result='[NEW]'flag='*'elif'Updating'inline:result='[UP]'flag='+'margin=width-(len(result)+len(flag))print'[{0}]: '.format(formula).ljust(margin,' ')+'=>'+ \
colorwht.format(flag)+colorgrn.format(result)page+=1# boilerplateif__name__=="__main__":main()
This essentially parses the HTML of the front page and works out the available page numbers then fetches each page individually, filters out all formula references and runs git clone/pull over them depending whether we have local copy of it or not.